Priorization of River Restoration by Coupling Soil and Water Assessment Tool (SWAT) and Support Vector Machine (SVM) Models in the Taizi River Basin, Northern China

Identifying priority zones for river restoration is important for biodiversity conservation and catchment management. However, limited data due to the difficulty of field collection has led to research to better understand the ecological status within a catchment and develop a targeted planning strategy for river restoration. To address this need, coupling hydrological and machine learning models were constructed to identify priority zones for river restoration based on a dataset of aquatic organisms (i.e., algae, macroinvertebrates, and fish) and physicochemical indicators that were collected from 130 sites in September 2014 in the Taizi River, northern China. A process-based model soil and water assessment tool (SWAT) was developed to model the temporal-spatial variations in environmental indicators. A support vector machine (SVM) model was applied to explore the relationships between aquatic organisms and environmental indicators. Biological indices among different hydrological periods were simulated by coupling SWAT and SVM models. Results indicated that aquatic biological indices and physicochemical indicators exhibited apparent temporal and spatial patterns, and those patterns were more evident in the upper reaches compared to the lower reaches. The ecological status of the Taizi River was better in the flood season than that in the dry season. Priority zones were identified for different hydrological seasons by setting the target values for ecological restoration based on biota organisms, and the results suggest that hydrological conditions significantly influenced restoration prioritization over other environmental parameters. Our approach could be applied in other seasonal river ecosystems to provide important preferences for river restoration.


Introduction
The ecological status of rivers is tightly related to human society. Human activities referring to industry, agriculture, and construction may affect important ecological functions and processes, such as nutrient cycling and carbon flux in food webs [1][2][3], change hydrological regimes, and lead to habitat degradation. The response of river ecosystems to those human activities varies with temporal and spatial scales, which poses a conundrum for river remediation and flow regulation [4,5]. There is a great interest in understanding how the ecological status of rivers changes with temporal and

The Framework of Model Coupling
The generation and migration of non-point source pollution in the basin is mainly determined by three major processes: surface rainfall-runoff process, rainfall-runoff erosion process, and leaching of soil by surface, soil, and underground runoff. Subsequently, the pollutants enter the water body, affecting the distribution of physicochemical indicators of the water body ( Figure 1). The SWAT model, which was based on physical process to simulate the flow environment and explore the temporal-spatial dynamics of physicochemical indicators, was applied at the watershed-scale in the Taizi River in northern China. Moreover, the SVM model was employed to analyze the response of aquatic organisms to environmental factors. The SWAT modelling results were coupled with SVM modelling inputs in order to simulate the temporal-spatial distribution of aquatic organisms, and the result was further validated by the measured data ( Figure 1).

Study Area
The Taizi River is located in the southeast of Liaoning Province, northern China (122°23' E-122°53' E, 40°28' N-41°39' N, Figure 2). It flows through Benxi City, Liaoyang City, Anshan City, and Haicheng City, and it covers a basin area of 13,900 km 2 , with a length of 413 km. Within the warm temperate sub-humid area, the Taizi River Basin has a continental monsoon climate. The upper reach of the basin is characterized by low hilly landform, whereas most river channels are between the valleys, with relatively less human exploration and more vegetation coverage. Many rare aquatic organisms have been recorded in this region, such as clean-type fishes (Lampetra morii, Odontobutis Obscurus, etc.), and clean large-scale macrobenthos (Epeorus melli, Cambaroides dauricus). In contrast, the middle and lower reaches are the plain area, with the terrain of higher southeast and lower northwest. Meandering river channels represent a curved type river. More distributed industries and human disturbance have led to excessive land utilization in lower reaches. In recent years, urbanization results in increasing pressure on the Taizi River Basin, such as water quality deterioration, habitat degradation, and biodiversity loss [21,22].

Study Area
The Taizi River is located in the southeast of Liaoning Province, northern China (122 • 23' E-122 • 53' E, 40 • 28' N-41 • 39' N, Figure 2). It flows through Benxi City, Liaoyang City, Anshan City, and Haicheng City, and it covers a basin area of 13,900 km 2 , with a length of 413 km. Within the warm temperate sub-humid area, the Taizi River Basin has a continental monsoon climate. The upper reach of the basin is characterized by low hilly landform, whereas most river channels are between the valleys, with relatively less human exploration and more vegetation coverage. Many rare aquatic organisms have been recorded in this region, such as clean-type fishes (Lampetra morii, Odontobutis Obscurus, etc.), and clean large-scale macrobenthos (Epeorus melli, Cambaroides dauricus). In contrast, the middle and lower reaches are the plain area, with the terrain of higher southeast and lower northwest. Meandering river channels represent a curved type river. More distributed industries and human disturbance have led to excessive land utilization in lower reaches. In recent years, urbanization results in increasing pressure on the Taizi River Basin, such as water quality deterioration, habitat degradation, and biodiversity loss [21,22].

Study Area
The Taizi River is located in the southeast of Liaoning Province, northern China (122°23' E-122°53' E, 40°28' N-41°39' N, Figure 2). It flows through Benxi City, Liaoyang City, Anshan City, and Haicheng City, and it covers a basin area of 13,900 km 2 , with a length of 413 km. Within the warm temperate sub-humid area, the Taizi River Basin has a continental monsoon climate. The upper reach of the basin is characterized by low hilly landform, whereas most river channels are between the valleys, with relatively less human exploration and more vegetation coverage. Many rare aquatic organisms have been recorded in this region, such as clean-type fishes (Lampetra morii, Odontobutis Obscurus, etc.), and clean large-scale macrobenthos (Epeorus melli, Cambaroides dauricus). In contrast, the middle and lower reaches are the plain area, with the terrain of higher southeast and lower northwest. Meandering river channels represent a curved type river. More distributed industries and human disturbance have led to excessive land utilization in lower reaches. In recent years, urbanization results in increasing pressure on the Taizi River Basin, such as water quality deterioration, habitat degradation, and biodiversity loss [21,22].
Fish samples were collected by electronic fishing and gill net fishing, and all fish samples were identified, enumerated, and weighed in situ. Rare or unknown species were preserved with 4% formalin for identification in the laboratory. Benthic macroinvertebrates were collected using a Surber net (30 cm × 30 cm, 500 µm mesh) and D-frame dip net (15 cm radius and 500 µm mesh), and they were identified to the genera level in the laboratory. Benthic algae were collected from all available substrates and habitats at each site, and were identified to the species level in the laboratory. Physicochemical parameters were measured in situ (i.e., DO and EC) or determined from water samples in the laboratory (i.e., NH 3 -N, COD, BOD 5 , TP, and TN), according to the Chinese Water Quality Standard Methods [24].
Eight biological indicators were implemented in the SVM model, i.e., fish species richness (F_S), fish index of biotic integrity (F_IBI), fish Berger-Parker index (F_BP), macroinvertebrate families richness (M_S), biological monitoring working party (M_BMWP), ephemeroptera, plecoptera and trichoptera family richness (M_EPT), algae species richness (A_S), and algae Berger-Parker index (A_BP). Among these indicators, F_S, F_IBI, and F_BP are related to physical, chemical, biological and zoogeographic factors, and long-term pressures [21]. M_S is a measure of diversity of macroinvertebrates, which reflects the general deterioration of water quality [25]. M_BMWP is used to assess organic pollution in freshwaters [26]. M_EPT is the taxa richness within the insect group, which is sensitive to contamination [27]. A_S and A_BP both reflect the water quality deterioration related to eutrophication and organic pollution [21]. These indicators were listed in Table 1, together with the related impact typologies.

SWAT Modelling
The digital elevation map (DEM), land use, soil type, meteorological station data, reservoir data, and agricultural production data ( Table 2) were input in the SWAT model. Modelling results were integrated in the Access database, which could be displayed in ArcGIS 10.1 Version (Esri, Redlands, CA, USA). In this study, the Taizi River Basin was divided into 130 sub-basins, and the sampling locations were used as the outlets in ArcSWAT ArcGIS extension 2012 Version (Blackland Research and Extension Center, Texas Agrilife Research & Grassland, Soil and Water Research Laboratory USDA Agriculture Research Service, Texas, USA). Data on hydrological stations, reservoirs, point emissions, and agricultural management information were also loaded for SWAT modelling. Three typical hydrological years were selected to investigate the distribution of aqueous environment factors in each sub-basin, i.e., flood year (2012), average water year (2004), and dry year (2014). The typical months of the dry season (April), the flood season (September), and the average water season (November) were chosen for modelling from the above three years. Therefore, the distribution of five environmental factors of WQ, TP, TN, DO, and BOD 5 was explored in nine different periods. Values of aqueous environment factors were calculated by the SWAT model output file (rch file).
It is necessary to calibrate the sensitive parameters from the modelling results, as there are more than 1000 parameters in SWAT, which can greatly improve the efficiency of the model. In this study, the SWAT-CUP toolbox, which is based on a mathematical algorithm shuffled complex evolution (SCE-UA) from the research of the University of Arizona, was used to automatically determine the parameters. SCE-UA is generally considered the most efficient and effective method [28], and is widely applied in the parameter calibration of hydrological models and other aspects, such as soil erosion, groundwater, remote sensing, and surface water simulation. In our study, the runoff parameters were calibrated from 1980 to 1992, and verified from 1992 to 2002, the physiochemical parameters were calibrated from 2007 to 2008, and verified from 2009, at a monthly scale. Nash-Sutcliffe coefficient (NS) and the coefficient of determination (R 2 ) were adopted as indicators to evaluate the calibration results. NS demonstrated the ratio of the residual variance to the variance of the measured data [29], showing the comparison of the ratio of observed value to simulated value with the 1:1 line. NS values ranged from 0 to 1. If the value is close to 1, then it indicates better modelling results are required; if NS ≥ 0.5, the results can be accepted. For R 2 values, if it is close to 1, it suggests that better modelling results are required. If R 2 is ≥0.6, the results can be accepted.

SVM Modelling
The SVM is a kernel-based learning algorithm, and it is widely used for pattern classification and regression [30]. In this study, 10 training and validation subsets were built. In each subset, 90% samples were used for training and 10% were for validation. Various search algorithms were applied to determine optimal parameters for the SVM model based on the lower values of the root-mean-square error (MSE) in the validation subset. The squared correlation coefficient (R 2 ) was chosen to describe the overall modelling performance.
A sensitivity analysis was applied to investigate sensitive environmental parameters that influence the response of biological indices. The one-factor-at-a-time (OAT) method was used as the assessment tool for checking sensitivity of model variables. The SVM models were running by removing a variable at a time with other parameters being constant. The variation in overall model performance (squared correlation coefficient, R 2 ) for a given variable was subsequently calculated to obtain the effects of the variable on the model performance, and this process was repeated for every variable. At this stage, the biological indices were selected for the simulation of their temporal-spatial dynamics with the aid of the SWAT model.

Identification of the Priority Areas
The priority sub-basins in different hydrological periods were identified by setting target values of ecological restoration. Firstly, three watershed-scale habitat typologies, i.e., highlands, midlands and lowlands, were taken from previous studies in the Taizi River Basin. Secondly, these typologies were used to establish target values for selected indicators. For the highlands, F_S was 'good', and DO, TN, and TP should meet the level 'II' of Surface Water Environmental Quality Standards of China (GB3838-2002) [31]. For the midlands and lowlands, F_S should reach the 'general' level, and DO, TN, TP should meet the level 'IV' of GB3838-2002. The specific value of each index was shown in Table 3.
The target values for F_S were derived from expert opinion.

Responses of Aquatic Biological Indices to Phsicochemical Indicators
R 2 values for each different SVM model were shown in Figure 3. All of those models achieved high values of explained variance (R 2 > 0.6) except M_BMWP and M_S, which were 0.41 and 0.59, respectively. The result indicated that the indices of fish communities (i.e., F_BP, F_S) and algal communities (i.e., A_BP, A_S) were better fitted with the environmental variables when compared with the indicators of macroinvertebrate fauna (i.e., M_BMWP, M_S). Therefore, the indices of fish and algal communities were selected to simulate their temporal-spatial dynamics.
Further, our results showed that, in the Taizi River, the SVM model could be a reliable prediction tool for fish and algal communities based on selected environmental factors. However, the ability of the model to predict macroinvertebrate communities was limited, indicating the increased number of pollution tolerance species (i.e., Orthocladiinae, Oligochaeta), and a reduced sensitivity to environmental stress in the Taizi River Basin.
Agricultural activities were the major type of human disturbance in this area, and significantly affected algal communities. Hydrological status (e.g., water quantity) and physiochemical conditions (e.g., COD, EC, TN) were both considered in the SVM, and played a crucial role in the reproduction and predation of fish communities [32].   Sensitivity analysis of the SVM model showed that algae and macroinvertebrates were more sensitive to nutrients, whereas fish communities were more sensitive to DO and organic pollutants. It has been documented that nutrients was a limiting factor for algal and macroinvertebrate communities [33]. Low levels of DO posed an impact on the tolerance limit of fish [34], affecting the structure of fish communities. In the marine environment, many fish became stressed at a DO level of 4.5 mg/L [35]. In the Taizi River, it has been reported that DO and other physicochemical indicators (such as TN and pH) had significant effects on fish spatial distribution at reach scale [36].

Temporal-Spatial Variations in Phsicochemical Indicators
The statistical indices (Table 5) showed that R 2 and NS in calibration periods of each hydrological station were higher than 0.7. R 2 in validation periods was higher than 0.6, and NS in validation periods was higher than 0.7.  Table 4 showed the R 2 for every input variable in the SVM model. OAT analysis checked the model fitting by removing a variable. If R 2 became smaller, it suggests a greater impact on the model fit, and the variable was more sensitive. For algal communities, the smallest R 2 for A_BP was 0.94 (TP), and for A_S was 0.90 (TN). For fish communities, R 2 for F_BP was 0.93 (BOD 5 ), for F_IBI was 0.62 (DO), and for F_S was 0.93 (BOD 5 ). For macroinvertebrate communities, R 2 for M_BMWP was 0.35 (BOD 5 ), for M_EPT was 0.65 (TN), and for M_S was 0.54 (TP). The result suggests that these sensitive environmental indicators were appropriate for the SWAT model. Sensitivity analysis of the SVM model showed that algae and macroinvertebrates were more sensitive to nutrients, whereas fish communities were more sensitive to DO and organic pollutants. It has been documented that nutrients was a limiting factor for algal and macroinvertebrate communities [33]. Low levels of DO posed an impact on the tolerance limit of fish [34], affecting the structure of fish communities. In the marine environment, many fish became stressed at a DO level of 4.5 mg/L [35]. In the Taizi River, it has been reported that DO and other physicochemical indicators (such as TN and pH) had significant effects on fish spatial distribution at reach scale [36].

Temporal-Spatial Variations in Phsicochemical Indicators
The statistical indices (Table 5) showed that R 2 and NS in calibration periods of each hydrological station were higher than 0.7. R 2 in validation periods was higher than 0.6, and NS in validation periods was higher than 0.7. After calibration, the SWAT model was used to simulate the temporal-spatial variations in TP, TN, DO, and BOD 5 concentrations. Results demonstrated that aqueous environment factors displayed apparent spatial and temporal patterns. Spatially, TP, TN, DO, and BOD 5 concentrations exacerbated gradually from upstream to downstream, and they were generally lower in tributaries than the mainstream, which was consistent with human disturbance gradient. Temporally, water quality in the flood season was better than that in the dry season and the average water season (Figure 4, taking TN as an example).  After calibration, the SWAT model was used to simulate the temporal-spatial variations in TP, TN, DO, and BOD5 concentrations. Results demonstrated that aqueous environment factors displayed apparent spatial and temporal patterns. Spatially, TP, TN, DO, and BOD5 concentrations exacerbated gradually from upstream to downstream, and they were generally lower in tributaries than the mainstream, which was consistent with human disturbance gradient. Temporally, water quality in the flood season was better than that in the dry season and the average water season ( Figure  4, taking TN as an example).  However, the temporal distribution of physicochemical indicators was not completely consistent with the annual and seasonal water flow variations. The correlation coefficients between TN/TP concentration and water flow were −0.19 and −0.10, respectively ( Figure 5). The relationship between pollutant concentrations and water flow was unclear. The modelling result indicated that hydrological characteristics had an effect on pollutant inputs in rivers, whereas the distribution of the physical and chemical indicators in water bodies was more likely related to the intensity of human activities.
simulate the temporal-spatial dynamics by coupling SVM and SWAT models. F_S exhibited an obvious deterioration from upstream to downstream, whereas those from the upper and middle reaches were greater than those from the tributaries, and those of the tributaries from upper headstream were greater than those in the middle and lower reaches. Further, F_S displayed the greatest value in the flood year, followed by that in the dry year, whereas those in the average water year were the lowest. Additionally, F_S showed the greatest value in September, while those from November and April were relatively lower ( Figure 6).
The spatial variation in A_S was similar to that of F_S. A_S gradually decreased from upstream to downstream, and it was greater in upper and middle reaches than the tributaries. A_S was greater in tributaries from upper headstream than those of middle and lower reaches, which was more apparent in the low water season than in the flood season. For annual variations, the greatest A_S appeared in the flood year, whereas the lowest was in the average water year. For monthly variations, the greatest A_S was in September, whereas November and April had relatively lower A_S values (Figure 7). The F_S and the A_S were validated in September 2014 with monitored results (Figure 8). According to the results, F_S in the level 'poor and very poor' (0-8) had an overlap ratio of 73.3%, whereas those in the level 'general' (8)(9)(10)(11)(12) had an overlap ratio 53.8%, and for the level 'good and very good' (>12) the overlap ratio was 50%. A_S in the level 'poor and very poor' (0-16) had an overlap ratio of 54.2%, whereas

Temporal-Spatial Dynamics of Aquatic Organisms
A fish ecological index (F_S, R 2 = 0.98) and an algal ecological index (A_S, R 2 = 0.97) were selected to simulate the temporal-spatial dynamics by coupling SVM and SWAT models. F_S exhibited an obvious deterioration from upstream to downstream, whereas those from the upper and middle reaches were greater than those from the tributaries, and those of the tributaries from upper headstream were greater than those in the middle and lower reaches. Further, F_S displayed the greatest value in the flood year, followed by that in the dry year, whereas those in the average water year were the lowest. Additionally, F_S showed the greatest value in September, while those from November and April were relatively lower ( Figure 6).
The spatial variation in A_S was similar to that of F_S. A_S gradually decreased from upstream to downstream, and it was greater in upper and middle reaches than the tributaries. A_S was greater in tributaries from upper headstream than those of middle and lower reaches, which was more apparent in the low water season than in the flood season. For annual variations, the greatest A_S appeared in the flood year, whereas the lowest was in the average water year. For monthly variations, the greatest A_S was in September, whereas November and April had relatively lower A_S values (Figure 7). The result showed that the number of priority zones for ecological restoration was tightly related to hydrological characteristics within the watershed. More sub-watersheds need to be repaired in the dry season, followed by the average water season and the flood season, indicating that aquatic biodiversity decreased as the water quantity declined. Former studies have demonstrated strong correlations between water ecological status and water quantity, which is consistent with our result. Accordingly, river restoration mainly concentrated on water quantity recovery [8]. A number of techniques for riverine restoration have been operated to address the hydrological problems, for example, water diversion and constructed wetland. The former dilutes and transports contaminants by importing a large volume of clean water from elsewhere which has better water quality, and the latter allows for the river to maintain a certain amount of water during the dry season [39]. The modelling results suggest that identifying the ecological restoration priority zones by the aquatic ecological data from one hydrological period is not completely reliable, as the ecological status varies with the hydrological characteristics. Therefore, the priority zones of river ecosystems with different hydrological characteristics should be considered to acquire comprehensive information.  The F_S and the A_S were validated in September 2014 with monitored results (Figure 8). According to the results, F_S in the level 'poor and very poor' (0-8) had an overlap ratio of 73.3%, whereas those in the level 'general' (8)(9)(10)(11)(12) had an overlap ratio 53.8%, and for the level 'good and very good' (>12) the overlap ratio was 50%. A_S in the level 'poor and very poor' (0-16) had an overlap ratio of 54.2%, whereas those that were in the level 'general' (16)(17)(18)(19)(20)(21)(22)(23)(24) had an overlap ratio 86.1%, and for the level 'good and very good' (>24) the overlap ratio was 66.8%. The simulation results of F_S and the measured values showed the highest overlap ratio in the level 'poor and very poor', indicating that simulated values were generally lower than measured values. This may be attributed to that measured values were affected by sampling methods, time, and other random factors, which may induce a wider range than simulated values. As for the A_S, the highest overlap ratio appeared in the level 'general', suggesting that simulated values were more approximate to the measured values.   The response of aquatic communities to environmental factors is very complex, not only to pollution in water bodies, but also to habitat physical conditions [37]. Previous studies pointed out that the amount of water is closely related to habitat status, and the quantitative relationship between flow and habitat indicators can be established through a certain relationship [38]. Therefore, the change of water quantity has an impact on the habitat of aquatic communities. Figure 9a showed that most sub-basins needed restoration in the dry season to meet the requirement of target values set in Table 3. In the upstream area, most of the sub-basins needed rehabilitation. In the middle and lower reaches, the majority of sub-basins of tributaries required restoration, whereas only several sub-basins of the mainstream were not demanding urgent rehabilitation. In contrast to the dry season, less sub-basins required rehabilitation in the flood season (Figure 9b). In the upstream area, only the sub-basin of the downstream tributary in Xiaotang River in Benxi County needed to be repaired, whereas in the middle and lower reaches, sub-basins, which required restoration, were mainly located along the mainstream. The number of sub-basins requiring rehabilitation in the average water season were less than that in the dry season (Figure 9c), but more than that in the flood season. In the upper reaches, the sub-basins of the tributary flowing from Guanyinge Reservoir through the Nandianzi Town of Benxi County, and the tributary sub-basin of the Nanfen District of Benxi City needed to be repaired. In the middle and lower reaches, sub-basins along the mainstream and the northeastern part of the North Shahe River tributaries demanded restoration, whereas most sub-basins in the south, except the Haicheng River tributaries, showed relatively better status with no need for restoration.

Conclusions
In this study, a method of temporal-spatial dynamic modelling of aquatic ecological status for ecological restoration by coupling SWAT and SVM models was established in the Taizi River, China. Results showed that there were significant temporal-spatial variations in physicochemical factors (TP, TN, DO, and BOD5) and aquatic biological indices (F_S and A_S). From upstream to downstream, physicochemical indicators displayed a gradual deterioration, whereas the upper and middle reaches of the mainstream showed better status than tributaries. Moreover, results indicated that tributaries from the upper reaches were characterized by greater quality than those from the middle and lower reaches. Further, aquatic organisms and aqueous physicochemical indicators implied the best ecological status in the flood season, and the worst in the dry season. Simulated values of aquatic organism indices were in good agreement with the measured values. Based on aqueous ecological dynamics, the priority zones of river ecological restoration in watersheds were identified. The results demonstrated that the sub-basin of the tributary flowing from Guanyinge Reservoir through Nandianzi Town of Benxi County was the key area of ecological restoration. The remedial priority area varied with hydrological seasons in the middle and lower reaches. More subbasins required restoration in the dry season and less in the flood season. The approach that is proposed in this study could provide references for the decision-making of the ecological restoration strategy for other river ecosystems.
Author Contributions: J.F., Z.Y. and Y.Z. designed the research; M.L., F.G. and X.Z. collected and analyzed the data; J.F. wrote the manuscript under the guidance of Z.X. and F.W. All authors have read and approved the final manuscript. were in good agreement with the measured values. Based on aqueous ecological dynamics, the priority zones of river ecological restoration in watersheds were identified. The results demonstrated that the sub-basin of the tributary flowing from Guanyinge Reservoir through Nandianzi Town of Benxi County was the key area of ecological restoration. The remedial priority area varied with hydrological seasons in the middle and lower reaches. More sub-basins required restoration in the dry season and less in the flood season. The approach that is proposed in this study could provide references for the decision-making of the ecological restoration strategy for other river ecosystems.
Author Contributions: J.F., Z.Y. and Y.Z. designed the research; M.L., F.G. and X.Z. collected and analyzed the data; J.F. wrote the manuscript under the guidance of Z.X. and F.W. All authors have read and approved the final manuscript.