Quantitative Analysis of the Driving Factors of Water Quality Variations in the Minjiang River in Southwestern China

: The Minjiang River is an important ﬁrst-level tributary of the Yangtze River. Understanding the driving factors of water quality variations in the Minjiang River is crucial for future policy planning of watershed ecology protection of the Yangtze River. The water quality of the Minjiang River is impacted by both meteorological factors and anthropogenic factors. By using wavelet analysis, machine learning, and Shapley analysis approaches, the impacts of meteorological factors and anthropogenic factors on the permanganate index (COD Mn ) and ammonia nitrogen (NH 3 -N) concentrations at the outlet of the Minjiang River Basin were quantiﬁed. The observed COD Mn and NH 3 -N concentration data in the Minjiang River from 2016 to 2020 were decomposed into long-term trend signals and periodic signals. The long-term trends in water qualities showed that anthropogenic factors were the major driving factors, accounting for 98.38% of the impact on COD Mn concentrations and 98.18% of the impact on NH 3 -N concentrations. The periodic ﬂuctuations in water qualities in the Minjiang River Basin were mainly controlled by meteorological factors, with an impact of 68.89% on COD Mn concentrations and 63.94% on NH 3 -N concentrations. Compared to anthropogenic factors, meteorological factors have a greater impact on water quality in the Minjiang River Basin during both the high-temperature and rainy seasons from July to September and during the winter from December to February. The separate quantiﬁcation of impacts of driving factors on the varying water quality signals contributed to the originality in this work, providing more intuitive insights for the assessment of the inﬂuences of policies and the climate change on the water quality.


Introduction
The Minjiang River is a primary tributary of the upper Yangtze River located in Southwestern China.As one of the earliest developed regions in Southwestern China, the Minjiang River Basin (MRB) has supported agricultural and industrial activities for centuries.In recent years, the MRB environmental authorities have enacted and operationalized various policies and ecological initiatives aimed at protecting and remediating the water quality within the basin in response to the increasing pressure on aquatic systems due to anthropogenic disturbances [1,2].The water quality of the Minjiang River was reported to have improved significantly from 2011 to 2020, characterized by a reduction in the permanganate index (COD Mn ), ammonia nitrogen (NH 3 -N), and total phosphorus (TP) contents by 7.59%, 20.54%, and 19.68%, respectively [3].
Water 2023, 15, 3299 2 of 16 The water quality of rivers is impacted by both meteorological factors and anthropogenic factors [4,5].Establishing relationships between driving factors and water quality indices is fundamental for distinguishing and quantifying the impacts from anthropogenic factors versus meteorological factors within basin-scale river systems [6,7].Modeling tools like process-based mechanistic models and data-driven machine learning models are effective for quantitatively delineating complex driver-response dynamics [8,9].
Watershed scale process-based models including SWAT [10], HSPF [11], and MIKE SHE [12] can represent interconnected hydrological, hydraulic, and water quality processes across heterogeneous landscapes.These models have capacities for characterizing explicit spatiotemporal details, mechanistic interpretability, and scenario analysis under altered climate or management conditions [13].For example, Liu et al. [14] constructed an integrated hydrological and water quality model of the MRB based on SWAT, simulated variations of flows, pollutant concentrations, and fluxes at the outlet of the MRB from 2015 to 2018.However, rigorous data requirements, huge computational costs, and the difficulty of calibration are significant challenges in practical applications of process-based models [15].
Machine learning approaches have shown abilities for emulating watershed hydrological and water quality variations by discovering empirical patterns in monitored data.Approaches like artificial neural networks [16], regression trees [17], and support vector machines [18] utilize flexible model architectures and optimization algorithms to fit highly nonlinear relationships between driving factors and response variables.Although black box modeling had limited physical interpretability for these processes [19], machine learning models have the advantage of automatic identification of key interactions and hydrological signatures in monitored data [20].The total nitrogen and total phosphorus contents of the Minjiang River were inverted using remote sensing materials and machine learning approaches, which were highly performed [21].
In the MRB, the monitored water quality data in previous studies had relatively low time frequencies (weeks to months), and it was difficult to characterize the short-term periodic patterns.Likewise, the fluctuation magnitudes of water quality data in the MRB were generally large, giving rise to significant difficulties in identifying long-term changes in the water quality.We speculate that the long-term trends and short-term periodic patterns were controlled by different driving mechanisms.
To further identify the impacts of driving factors on the water quality, statistical approaches and scenario analyses have been commonly used [22,23].In the MRB, the impacts of the spatial land use distribution of riparian zones were qualitatively assessed [21].The impacts of pollution reduction measures on water quality improvement were evaluated through a scenario analysis, but the results were highly affected by the selected base year, and no conclusive quantified result was demonstrated [20].Yuan et al. [3] developed a statistically based climate-water quality assessment framework to compare the impacts of anthropogenic factors and meteorological factors on the water quality of the Minjiang River.The framework implemented nonparametric analysis approaches to alleviating constraints due to the quality and quantity of input data.However, this statistically based framework was unable to quantify the temporal and spatial variations of the relative impacts of anthropogenic factors and meteorological factors on the water quality.Consequently, similar studies of the MRB have not successfully quantify the impacts of anthropogenic factors and meteorological factors on the long-term trend and periodic pattens of the water quality indices separately.This has significant potential for the assessment of environmental policies for the watershed management of the MRB.
Based on the high-frequency water quality data monitored from 2016-2020 at the outlet of the MRB, a wavelet analysis was used in this study to decompose time series raw data into signals characterizing long-term trends and periodic fluctuations.In addition to the collected pollutants data and meteorological data within the basin, machine learning and the Shapley analysis were further utilized to quantify the relative impacts of driving factors on both the long-term trends and periodic patterns of water quality indices (Figure 1).The influences of both climate change and human activities across varying timescales in the MRB were considered.The findings of this study may provide quantitative insights for accurately assessing regional policies for watershed ecological restoration, contributing to aquatic environment management in the MRB and the broader Yangtze River Basin.Additionally, the methodological framework developed in the present study may provide suggestions about water quality risk management for other watersheds.).The influences of both climate change and human activities across varying timescales in the MRB were considered.The findings of this study may provide quantitative insights for accurately assessing regional policies for watershed ecological restoration, contributing to aquatic environment management in the MRB and the broader Yangtze River Basin.Additionally, the methodological framework developed in the present study may provide suggestions about water quality risk management for other watersheds.

Study Area
The study area included the mainstream of the Minjiang River, the Daduhe River Basin, and the Qingyijiang River Basin (Figure 2).The MRB has an annual mean temperature of 9.1 °C and an annual mean precipitation of 1083 mm.The main stream of the Minjiang River originates from the southern foothills of the Min Mountains on the Sichuan-Gansu border, flowing southward with a length of 7.530 × 10 2 km and a basin area of 4.532 × 10 3 km 2 .The Daduhe River is a first-order tributary joining the mainstream of the Minjiang River at Leshan City, Sichuan.It flows southward through Qinghai Province and Sichuan Province with a total length of 1.074 × 10 3 km and a basin area of 7.715 × 10 4 km 2 .The part of Daduhe River located in Sichuan Province with a length of 8.760 × 10 2 km (81.60% of the total length of the river) and a basin area of 6.792 × 10 4 km 2 (88.00% of the whole basin) was considered in this study.The Qingyijiang River is a first-order tributary converging with the Daduhe River at Leshan City, with a length of 2.870 × 10 2 km and a basin area of 1.285 × 10 4 km 2 .The mean annual discharge at the outlet of the MRB is 2.850 × 10 3 m 3 /s.

Study Area
The study area included the mainstream of the Minjiang River, the Daduhe River Basin, and the Qingyijiang River Basin (Figure 2).The MRB has an annual mean temperature of 9.1 • C and an annual mean precipitation of 1083 mm.The main stream of the Minjiang River originates from the southern foothills of the Min Mountains on the Sichuan-Gansu border, flowing southward with a length of 7.530 × 10 2 km and a basin area of 4.532 × 10 3 km 2 .The Daduhe River is a first-order tributary joining the mainstream of the Minjiang River at Leshan City, Sichuan.It flows southward through Qinghai Province and Sichuan Province with a total length of 1.074 × 10 3 km and a basin area of 7.715 × 10 4 km 2 .The part of Daduhe River located in Sichuan Province with a length of 8.760 × 10 2 km (81.60% of the total length of the river) and a basin area of 6.792 × 10 4 km 2 (88.00% of the whole basin) was considered in this study.The Qingyijiang River is a first-order tributary converging with the Daduhe River at Leshan City, with a length of 2.870 × 10 2 km and a basin area of 1.285 × 10 4 km 2 .The mean annual discharge at the outlet of the MRB is 2.850 × 10 3 m 3 /s.

Pollutant Loads
The pollutant load data used in this study were derived from the Sichuan statistical yearbooks over the years (http://tjj.sc.gov.cn/scstjj,accessed on 6 June 2023).The types of pollutant loads were divided into three categories: industrial sources, urban living sources, and agricultural sources.The basic unit of the pollutant loads at the spatial scale was the prefecture-level city.The industrial and urban living pollutant loads were characterized by six variables from the statistical yearbooks of Sichuan Province, which included the industrial wastewater discharged, COD emissions from industrial wastewater, ammonia nitrogen emissions from industrial wastewater, urban living wastewater discharged, COD emissions in urban sewage, and ammonia nitrogen emissions from domestic sewage.The data points for these variables across 10 years were fitted to their dates.Then data during the research period (2016-2020) were extracted and downscaled to the daily frequency and used as input data.On the other hand, the migration and transformation of pollutants from agricultural sources are more complex [24]; therefore, there are few effective methods to accurately quantify the pollutant loads from agricultural sources on a large basin scale [25].Therefore, the fertilizer use data in the Sichuan statistical yearbooks were used to reflect the agricultural pollutant loads.The raw data of the pollutant loads were collected based on the administrative division, requiring further spatial split or integration based on the distribution of GDP, population, and area of each prefecturelevel city within the MRB.The data on pollutant loads used in this study were from 2016 to 2020.

Data Sources 2.2.1. Pollutant Loads
The pollutant load data used in this study were derived from the Sichuan statistical yearbooks over the years (http://tjj.sc.gov.cn/scstjj,accessed on 6 June 2023).The types of pollutant loads were divided into three categories: industrial sources, urban living sources, and agricultural sources.The basic unit of the pollutant loads at the spatial scale was the prefecture-level city.The industrial and urban living pollutant loads were characterized by six variables from the statistical yearbooks of Sichuan Province, which included the industrial wastewater discharged, COD emissions from industrial wastewater, ammonia nitrogen emissions from industrial wastewater, urban living wastewater discharged, COD emissions in urban sewage, and ammonia nitrogen emissions from domestic sewage.The data points for these variables across 10 years were fitted to their dates.Then data during the research period (2016-2020) were extracted and downscaled to the daily frequency and used as input data.On the other hand, the migration and transformation of pollutants from agricultural sources are more complex [24]; therefore, there are few effective methods to accurately quantify the pollutant loads from agricultural sources on a large basin scale [25].Therefore, the fertilizer use data in the Sichuan statistical yearbooks were used to reflect the agricultural pollutant loads.The raw data of the pollutant loads were collected based on the administrative division, requiring further spatial split or integration based on the distribution of GDP, population, and area of each prefecture-level city within the MRB.The data on pollutant loads used in this study were from 2016 to 2020.

Water Quality Data
The water quality data in this study were monitored at the Liangjianggou Station (longitude 104.62 • E, latitude 28.78 • N), which is located at the outlet of the MRB.In this study, the concentrations of COD Mn and NH 3 -N were selected as the objects characterizing the water quality of the Minjiang River.The monitored indices and corresponding monitoring methods used in this study are shown in Table 1.The water quality data were monitored from 1 January 2016 to 31 December 2020, and the temporal resolution was 1 d.The mean annual concentrations of COD Mn and NH 3 -N were 2.04 mg/L and 0.20 mg/L, respectively.The meteorological data in this study were collected from the National Centers for Environmental Information of the United States (https://www.ncei.noaa.gov,accessed on 15 May 2023).There are eight meteorological stations involved in the MRB (Figure 2 and Table 2), scattered in the upper, middle, and lower reaches of the study area.The meteorological indices used in this study included the air temperature and precipitation depth.After raw data preprocessing, the time frequency was integrated from 3 h to 1 d.The period of the meteorological data is from 1 January 2016 to 31 December 2020.

Wavelet Analysis
The long-term trend and periodic pattern are both important for time series data analysis [26,27].The long-term trend describes the overall changes in an indicator over the past several years.The periodic pattern describes the cyclic variation in data on smaller time scales, such as the monthly, seasonal, quarterly, or annual patterns.The monitored water quality data include both long-term trends and periodic signals; therefore, it is necessary to decompose the water quality data through a time-frequency domain analysis to further quantify the impact of different driving factors on the long-term trend and periodic signals of water qualities.
Wavelet analysis is a classic time-frequency domain method that can decompose non-stationary signals into wavelet functions at different scales and positions [28].The basic principle is to decompose a signal into a linear combination of a set of wavelet basis functions, where each wavelet basis function is constructed from a mother wavelet function with different scaling and shifting parameters [29].The scaling and shifting parameters of these wavelet basis functions control the time and frequency resolution of the analysis results.The wavelet analysis process consists of two steps: decomposition and reconstruction.At first, the original signal is decomposed into wavelet coefficients at multiple scales and frequencies.Then, in the reconstruction step, the wavelet coefficients are combined to simulate the original signal [30].
In this study, the single-level discrete wavelet transform (DWT) and discrete Meyer (dmey) mother wavelet function were used in the wavelet analysis.The wavelet decomposition level was set to 10 (Figure 3).
Water 2023, 15, x FOR PEER REVIEW 6 of 17 analysis results.The wavelet analysis process consists of two steps: decomposition and reconstruction.At first, the original signal is decomposed into wavelet coefficients at multiple scales and frequencies.Then, in the reconstruction step, the wavelet coefficients are combined to simulate the original signal [30].
In this study, the single-level discrete wavelet transform (DWT) and discrete Meyer (dmey) mother wavelet function were used in the wavelet analysis.The wavelet decomposition level was set to 10 (Figure 3).

Machine Learning Models
Machine learning algorithms have advantages in nonlinear modeling, including missing data processing, large-scale data processing, and fast prediction optimization capabilities [31].The accuracy of the relationships established between the driving factors and water quality indices of the MRB using different machine learning algorithms were compared.Then, the model with the highest accuracy was used for further quantitative analysis of the impacts of the driving factors.Support vector machines (SVM), ensembles of trees, neural networks, and regression trees were included in this study.The computing platform is MATLAB R2020b.
SVMs are boundary-based classification methods to divide data into different categories by searching for the optimal hyperplane in the feature space [18].Ensembles of trees is an ensemble learning method that can be used for data regression and classification [17].It consists of a weighted combination of multiple trees, including various algorithms such as the random forest and gradient boosting.Neural networks are based on neurons and their connections, simulating complex nonlinear functions by optimizing model hyperparameters [16].The regression tree algorithm is based on a tree structure that is used to perform regressions which use a series of decision rules to divide the data into different subsets, thereby achieving data classification and prediction [32].

Machine Learning Models
Machine learning algorithms have advantages in nonlinear modeling, including missing data processing, large-scale data processing, and fast prediction optimization capabilities [31].The accuracy of the relationships established between the driving factors and water quality indices of the MRB using different machine learning algorithms were compared.Then, the model with the highest accuracy was used for further quantitative analysis of the impacts of the driving factors.Support vector machines (SVM), ensembles of trees, neural networks, and regression trees were included in this study.The computing platform is MATLAB R2020b.
SVMs are boundary-based classification methods to divide data into different categories by searching for the optimal hyperplane in the feature space [18].Ensembles of trees is an ensemble learning method that can be used for data regression and classification [17].It consists of a weighted combination of multiple trees, including various algorithms such as the random forest and gradient boosting.Neural networks are based on neurons and their connections, simulating complex nonlinear functions by optimizing model hyperparameters [16].The regression tree algorithm is based on a tree structure that is used to perform regressions which use a series of decision rules to divide the data into different subsets, thereby achieving data classification and prediction [32].

Shapley Analysis
The Shapley analysis is based on cooperative game theory to explain the results of machine learning models [33].For each feature, the Shapley analysis first constructs a "feature set" that includes all the possible combinations with other features, then it calculates the contribution of each feature set to the model prediction.Using the model including the feature set to calculate the prediction value and subtracting the prediction value calculated by the model without the feature set, the obtained results were identified as the contribution of the feature set to the model prediction value, which quantifies the impacts of each feature on the result [34].

Statistical Index
The determination coefficient (R 2 ) output by algorithm modules in MATLAB was used to evaluate the modeling performance, which was compared the simulated and observed results.

Decomposition Results of Water Quality Indices through the Wavelet Analysis
The observed daily COD Mn and NH 3 -N concentrations from 2016 to 2020 at Liangjianggou Station, the outlet of the MRB, were decomposed based on the wavelet analysis.The raw data were decomposed into a long-term trend signal CA10 on the interannual scale and periodic signals CD1~10 of different frequencies fluctuating around 0 (Figures 4 and 5 The long-term trends in both the COD Mn and NH 3 -N concentrations from 2016 to 2020 showed an increasing and then a decreasing trend.The CA10 curve of the COD Mn concentration increased and decreased by nearly the same magnitude (Figure 4), while the long-term NH 3 -N signal decreased significantly after a slight increase, close to an overall downward trend (Figure 5).In terms of the periodic signals, the amplitudes of the COD Mn concentration variation gradually increased over time, while the amplitudes of the NH 3 -N concentration variation were relatively gentler except for the period from 2018 to 2019.The results of the present study were different from Yuan et al. [3], in that the COD Mn and NH 3 -N concentrations had significant decreasing trends between 2011 and 2020.This is likely because that the decreasing trends in the COD Mn and NH 3 -N concentrations were averaged and calculated based on the observed raw data of 26 water quality monitoring stations within the MRB, where the sites in the upper reaches of the basin presented more significant trends [3].The results of the present study only originated from data at the watershed outlet, where the water quality signals were from various sources and the trends were eliminated.Likewise, the decreasing trends in the raw COD Mn and NH 3 -N data in Yuan et al. [3] might have been disturbed by large magnitudes of periodic fluctuations, which were detected as having considerable interannual variations in our results.This is why we the raw data should be decomposed into signals with different time frequencies.

The Quantified Impacts of Driving Factors on Water Quality Indices of the MRB
The variations in the COD Mn and NH 3 -N concentrations monitored at the outlet of the MRB were affected by a combination of meteorological and anthropogenic factors.
In terms of the anthropogenic factors, the relevant driving factors for the water qualities of the MRB demonstrated distinct patterns as a result of economic developments and environmental policy implementation.The pollutant loads from industrial sources decreased from 2016 to 2020.With rapid urbanization and improvements in rural sewage treatment systems, the wastewater and chemical oxygen demand (COD) discharges from urban living sources showed increasing trends, while the NH 3 -N discharge from urban living sources was continuously reduced (Table 3).Agricultural non-point pollution is one of the most difficult factors in accurately quantifying the impacts of anthropogenic factors.This study applied fertilizer use to characterize the agricultural pollutant loads from non-point sources.The amounts of total fertilizer use, nitrogen fertilizer use, and Water 2023, 15, 3299 9 of 16 phosphorus fertilizer use in the MRB decreased year by year, while the compound fertilizer use increased (Table 3).In terms of the meteorological factors, the air temperature and precipitation depth from eight meteorological stations were used to analyze the impacts on the water quality indices of the MRB (Figure 1).The air temperature demonstrated significant seasonal patterns (Figure 6), which can impact the migration and transformation of aquatic organisms and pollutants by changing the dissolved oxygen concentrations, pH values, and biological activities in water as well as the water density and flow state [35].Precipitation is an important driving force for water and material cycles.The amount, form, timing, and spatial distribution of precipitation events can affect material cycles in land-shoreriver-coupled systems through runoff generation and confluence processes in basin-scale aquatic environments.Precipitation events will thus affect the pollutant concentrations and distributions in river systems [36].According to the data from eight meteorological stations in the MRB, the annual precipitation in the MRB showed an upward trend from 2016 to 2020 (Figure 7).More precipitation can significantly increase the river runoff and improve the hydrodynamic conditions of rivers, which are conducive to enhancing the self-purification capacity of rivers.However, rainfall-runoff processes can increase the amount of pollutants entering rivers, which produces certain risks to the aquatic environment [37].Based on the time series data of variables characterizing the anthropogenic factors and meteorology monitored in different locations of the MRB, machine learning algorithms and a Shapley analysis were combined to quantify the impact of the driving factors on the water quality indices.
The performances of four mapping models between the driving factors and water quality indices, constructed using four machine learning algorithms, were evaluated and compared, including the ensemble tree, regression tree, neural network, and support vector machine.The input data for the machine learning training included the pollutant loads derived from industrial, agricultural, and urban living sources as well as air temperature and precipitation data.The output data of the model included the mean daily CODMn and NH3-N concentrations at the Liangjianggou Station from 2016 to 2020 as well as the decomposed trend and periodic CODMn and NH3-N concentration signals.Based on the time series data of variables characterizing the anthropogenic factors and meteorology monitored in different locations of the MRB, machine learning algorithms and a Shapley analysis were combined to quantify the impact of the driving factors on the water quality indices.
The performances of four mapping models between the driving factors and water quality indices, constructed using four machine learning algorithms, were evaluated and compared, including the ensemble tree, regression tree, neural network, and support vector machine.The input data for the machine learning training included the pollutant loads derived from industrial, agricultural, and urban living sources as well as air temperature and precipitation data.The output data of the model included the mean daily COD Mn and NH 3 -N concentrations at the Liangjianggou Station from 2016 to 2020 as well as the decomposed trend and periodic COD Mn and NH 3 -N concentration signals.
The training results of the COD Mn models were better than the NH 3 -N models.According to the determination coefficients (R 2 ) of the training processes, the training results of the trend signals were the best (R 2 ≥ 0.99), followed by the periodic signals and monitored raw data (Table 4).This is mainly because the variation in the trend signals is smoother and simpler, which make it easier for the models to learn.The periodic data also had better periodicity than the raw data, making the learning process easier.Likewise, the periodic data are much more complex than the trend signals, so the training results for the long-term trend signals were the best.Among the four training methods, the ensemble trees had the best performance, which was based on the R 2 values and the Taylor diagram of the modeling results (Table 4 and Figure 8).The comparisons between the simulated results and observed data are presented in Figure S1, Figure S2, Figure S3, Figure S4.Hence, the results obtained using the ensemble trees were used for the subsequent Shapley analysis.Using the Shapley analysis and the trained machine learning models, the impacts of the driving factors on the monitored raw data, long-term trend signals, and periodic signals of the CODMn and NH3-N concentrations were quantified (Figures 9 and 10).For the monitored raw CODMn concentrations, the impacts of meteorological factors occupied a proportion of 64.13%, while the results for the anthropogenic factors occupied 35.87%.Of the meteorological factors, the impacts caused by air temperature and precipitation were Using the Shapley analysis and the trained machine learning models, the impacts of the driving factors on the monitored raw data, long-term trend signals, and periodic signals of the COD Mn and NH 3 -N concentrations were quantified (Figures 9 and 10).For the monitored raw COD Mn concentrations, the impacts of meteorological factors occupied a proportion of 64.13%, while the results for the anthropogenic factors occupied 35.87%.Of the meteorological factors, the impacts caused by air temperature and precipitation were 42.14% and 21.99%, respectively.On the other hand, the anthropogenic factors had much less impact than the meteorological factors, with agricultural, urban living, and industrial factors accounting for 18.32%, 7.98%, and 9.57%, respectively.As for the longterm trend signal of the COD Mn concentration, the anthropogenic factors accounted for 98.38%, among which the industrial sources (48.93%) and agricultural sources (32.00%) were more important than the urban living sources (17.45%).For the periodic signals of the COD Mn concentrations, the result was similar to the raw monitored data, with a slight increase in the impacts of meteorological factors (68.89%).
For the monitored raw NH 3 -N concentrations at the outlet of the MRB, the anthropogenic factors were the main factors controlling the variations in NH 3 -N concentrations, with a contribution of 58.88%, exceeding the meteorological factors of 41.12%.Among the anthropogenic factors, the impacts on the monitored raw NH 3 -N data from industrial sources (23.74%) and urban living sources (22.65%) were more significant than the impacts of agricultural sources (12.49%).For the long-term trend signals of the NH 3 -N concentration, the impacts of anthropogenic factors accounted for 98.18%, indicating the leading role of human activities in the long-term trend variations in NH 3 -N concentrations at the outlet of the MRB.However, different from the long-term trend signals of the COD Mn concentrations, agricultural sources (43.09%) and urban living sources (35.78%) had significantly higher impacts than industrial sources (19.32%).Likewise, variations in the NH 3 -N concentration periodic signals were mainly controlled by meteorological factors (63.94%), which were similar to the results of the COD Mn concentration periodic signals.
ity of the Minjiang River.The results of the monitored raw data in this study were in agreement with the findings of Yuan et al. [3], who demonstrated that meteorological factors accounted for approximately 60% of the impacts on the CODMn concentrations of the MRB, while approximately 40% of the impacts on the NH3-N concentrations can be attributed to meteorological factors.Notably, our study further delineated the impacts of different driving factors in a more detailed classification and realized a dynamic assessment of these impacts, as described in the following section.

Seasonal Patterns of Quantified Impacts of Driving Factors on Water Quality
The driving factors of the water quality at the outlet of the MRB had significant seasonal patterns within the year, especially meteorological factors.Therefore, the impacts of different driving factors on the water quality at the outlet of the MRB also varied dynamically.The monthly patterns of impacts on the CODMn and NH3-N concentrations from 2016 to 2020 in the MRB were calculated (Figure 11) based on variations in driving factors like precipitation, air temperature, and pollution from different sources.
Significant seasonal patterns in the impacts of all the driving factors on the CODMn and NH3-N concentrations were identified.Meteorological factors had greater impacts on the water quality indices in periods with high air temperature and flood events (July to September), as well as periods with low air temperatures (December to February) compared to in other seasons.The increased impacts in periods with higher air temperatures could be explained by enhanced biogeochemical processes in river water, for instance, by affecting microbial metabolic rates, which can subsequently influence the dissolved oxygen concentrations, self-purification capacity, and water quality of water bodies.[38].Under low-temperature conditions in winter, the thermal movement of water molecules and the kinetic energy of Brownian motion of colloidal particles are slowed down, which may reduce the degradation rate of pollutants in river water [39], resulting in a tougher aquatic environment.Furthermore, seasonal patterns were also derived from the impacts of the precipitation factor, which displayed significantly high values during flood seasons.Accelerated migration processes of water and pollutants into rivers triggered by more storms during this period might lead to stronger impacts on the water quality of the Minjiang River.Overall, in the MRB, the long-term trend variations in the water quality data were primarily controlled by anthropogenic factors, while the periodic fluctuations were dominated by meteorological factors.The results indicated the positive effect of restricted policies for pollutants emissions and ecological protection.In addition, similar results for monitored raw data and periodic signals also reflected the key role of meteorological factors like precipitation and temperature changes for short-term patterns in the water quality of the Minjiang River.The results of the monitored raw data in this study were in agreement with the findings of Yuan et al. [3], who demonstrated that meteorological factors accounted for approximately 60% of the impacts on the COD Mn concentrations of the MRB, while approximately 40% of the impacts on the NH 3 -N concentrations can be attributed to meteorological factors.Notably, our study further delineated the impacts of different driving factors in a more detailed classification and realized a dynamic assessment of these impacts, as described in the following section.

Seasonal Patterns of Quantified Impacts of Driving Factors on Water Quality
The driving factors of the water quality at the outlet of the MRB had significant seasonal patterns within the year, especially meteorological factors.Therefore, the impacts of different driving factors on the water quality at the outlet of the MRB also varied dynamically.The monthly patterns of impacts on the COD Mn and NH 3 -N concentrations from 2016 to 2020 in the MRB were calculated (Figure 11) based on variations in driving factors like precipitation, air temperature, and pollution from different sources.• The long-term trend signals of both the CODMn and NH3-N concentrations showed an increasing trend followed by a decreasing trend, in which the CODMn concentration increased and decreased by roughly the same magnitude, while the NH3-N concentration decreased more.This indicated that within the study period, the deterioration trend of water quality in the Minjiang River had been effectively controlled and significantly improved.The periodic signals of the CODMn concentrations exhibited a greater amplitude of fluctuation compared to the NH3-N concentrations, implying that the meteorological periodic drivers may have more pronounced influences on the CODMn concentrations.• Four machine learning algorithms were used to construct relationships between the driving factors and water quality indices of the MRB.The ensembles of trees approach demonstrated the best performances for both CODMn and NH3-N concentrations (R 2 = 0.3648-0.9998).• For the monitored raw data, the meteorological factors were the dominant factors affecting the variations in CODMn concentrations at the outlet of the MRB (accounting for 64.13%), while the anthropogenic factors were the major factors affecting the NH3-N concentrations (accounting for 58.88%).In terms of the long-term trend signals, anthropogenic factors were the uncontroversial controlling factors, with quantified impacts of 98.38% on the CODMn concentrations and 98.18% on the NH3-N concentrations.For periodic signals, the meteorological factors had higher impact values, with a 68.89% impact on the CODMn concentrations and a 63.94% impact on the NH3-N concentrations.Significant seasonal patterns in the impacts of all the driving factors on the COD Mn and NH 3 -N concentrations were identified.Meteorological factors had greater impacts on the water quality indices in periods with high air temperature and flood events (July to September), as well as periods with low air temperatures (December to February) compared to in other seasons.The increased impacts in periods with higher air temperatures could be explained by enhanced biogeochemical processes in river water, for instance, by affecting microbial metabolic rates, which can subsequently influence the dissolved oxygen concentrations, self-purification capacity, and water quality of water bodies.[38].Under low-temperature conditions in winter, the thermal movement of water molecules and the kinetic energy of Brownian motion of colloidal particles are slowed down, which may reduce the degradation rate of pollutants in river water [39], resulting in a tougher aquatic environment.Furthermore, seasonal patterns were also derived from the impacts of the precipitation factor, which displayed significantly high values during flood seasons.Accelerated migration processes of water and pollutants into rivers triggered by more storms during this period might lead to stronger impacts on the water quality of the Minjiang River.

Conclusions
In this study, the monitored raw data of the COD Mn and NH 3 -N concentrations at the outlet of the MRB from 2016 to 2020 were decomposed into two distinct kinds of signals through a wavelet analysis: the long-term trend signals and the periodic signals.
Machine learning approaches and a Shapley analysis were further used to quantify the impacts of different driving factors on water quality signals at the outlet of the MRB, and the seasonality of the quantified impacts were analyzed.The coupled framework for the quantification of the impacts of driving factors on decomposed water quality signals is the highlight of our study.The main research conclusions are as follows:

•
The long-term trend signals of both the COD Mn and NH 3 -N concentrations showed an increasing trend followed by a decreasing trend, in which the COD Mn concentration increased and decreased by roughly the same magnitude, while the NH 3 -N concentration decreased more.This indicated that within the study period, the deterioration trend of water quality in the Minjiang River had been effectively controlled and significantly improved.The periodic signals of the COD Mn concentrations exhibited a greater amplitude of fluctuation compared to the NH 3 -N concentrations, implying that the meteorological periodic drivers may have more pronounced influences on the COD Mn concentrations.

•
Four machine learning algorithms were used to construct relationships between the driving factors and water quality indices of the MRB.The ensembles of trees approach demonstrated the best performances for both COD Mn and NH 3 -N concentrations (R 2 = 0.3648-0.9998).

•
For the monitored raw data, the meteorological factors were the dominant factors affecting the variations in COD Mn concentrations at the outlet of the MRB (accounting for 64.13%), while the anthropogenic factors were the major factors affecting the NH 3 -N concentrations (accounting for 58.88%).In terms of the long-term trend signals, anthropogenic factors were the uncontroversial controlling factors, with quantified impacts of 98.38% on the COD Mn concentrations and 98.18% on the NH 3 -N concentrations.For periodic signals, the meteorological factors had higher impact values, with a 68.89% impact on the COD Mn concentrations and a 63.94% impact on the NH 3 -N concentrations.

•
The quantified impacts of the driving factors on the water quality of the Minjiang River had seasonal patterns.The meteorological factors demonstrated higher impacts during the flood season with high temperatures (July to September) and the dry season with low temperatures (December to February) compared to other seasons, indicating that the high temperature, low temperature, and precipitation events can significantly alter the biogeochemical processes in the MRB, further affecting the water quality.
Likewise, compared with pollution from industrial sources and urban living sources, agricultural emissions have more dramatic fluctuations and stronger randomness, making pollutants more difficult to accurately quantify.Therefore, more efforts should be considered in our future work regarding the accurate data acquisition and quantification of the pollutants loads from agricultural sources.

Figure 1 .
Figure 1.Overview of the present study.

Figure 1 .
Figure 1.Overview of the present study.

Figure 2 .
Figure 2. The locations of the meteorological and water quality monitoring stations in the MRB.

Figure 2 .
Figure 2. The locations of the meteorological and water quality monitoring stations in the MRB.

Figure 3 .
Figure 3. (a) Flow chart and results of the wavelet analysis for different (b) CODMn and (c) NH3-N concentrations.

Figure 3 .
Figure 3. (a) Flow chart and results of the wavelet analysis for different (b) COD Mn and (c) NH 3 -N concentrations.
).The CA10 signal characterized the long-term patterns of these two water quality indices, reflecting long-lasting impacts from anthropogenic activities, policy measures, and global climate change in the MRB.The signals CD1~10 were composed of a series of signals with different frequencies, characterizing periodic fluctuations of water quality indices and reflecting seasonal and stochastic impacts within the year in the MRB.

Water 2023 , 17 Figure 4 .
Figure 4. Decomposed results of the CODMn concentration at the outlet of the MRB based on the wavelet analysis.

Figure 4 .of 16 Figure 4 .
Figure 4. Decomposed results of the COD Mn concentration at the outlet of the MRB based on the wavelet analysis.

Figure 5 .
Figure 5. Decomposed results of the NH3-N concentration at the outlet of the MRB based on the wavelet analysis.

Figure 5 .
Figure 5. Decomposed results of the NH 3 -N concentration at the outlet of the MRB based on the wavelet analysis.

Water 2023 , 17 Figure 6 .
Figure 6.The air temperature variation at different meteorological stations in the MRB.Figure 6.The air temperature variation at different meteorological stations in the MRB.

Figure 6 .
Figure 6.The air temperature variation at different meteorological stations in the MRB.Figure 6.The air temperature variation at different meteorological stations in the MRB.

Figure 6 .
Figure 6.The air temperature variation at different meteorological stations in the MRB.

Figure 7 .
Figure 7. Variations in the mean daily and annual precipitation of different stations in the MRB.

Figure 7 .
Figure 7. Variations in the mean daily and annual precipitation of different stations in the MRB.

Figure 8 .
Figure 8.The the Taylor diagrams of the modeling results of monitored data, periodic signals and long-term trend signals of CODMn and NH3-N concentrations by different machine learning approaches.

Figure 8 .
Figure 8.The the Taylor diagrams of the modeling results of monitored data, periodic signals and longterm trend signals of COD Mn and NH 3 -N concentrations by different machine learning approaches.

Figure 9 .
Figure 9.The quantitative impacts of driving factors on the (a) monitored raw data, (b) long-term trend signals, and (c) periodic signals of CODMn concentration at the outlet of the MRB.

Figure 9 . 17 Figure 10 .
Figure 9.The quantitative impacts of driving factors on the (a) monitored raw data, (b) long-term trend signals, and (c) periodic signals of COD Mn concentration at the outlet of the MRB.Water 2023, 15, x FOR PEER REVIEW 13 of 17

Figure 10 .
Figure 10.The quantitative impacts of driving factors on the (a) monitored raw data, (b) long-term trend signals, and (c) periodic signals of the NH 3 -N concentration at the outlet of the MRB.

Figure 11 .
Figure 11.The quantitative impacts of driving factors on the (a) CODMn and (b) NH3-N concentrations at the outlet of the MRB.
In this study, the monitored raw data of the CODMn and NH3-N concentrations at the outlet of the MRB from 2016 to 2020 were decomposed into two distinct kinds of signals through a wavelet analysis: the long-term trend signals and the periodic signals.Machine learning approaches and a Shapley analysis were further used to quantify the impacts of different driving factors on water quality signals at the outlet of the MRB, and the seasonality of the quantified impacts were analyzed.The coupled framework for the quantification of the impacts of driving factors on decomposed water quality signals is the highlight of our study.The main research conclusions are as follows:

Figure 11 .
Figure 11.The quantitative impacts of driving factors on the (a) COD Mn and (b) NH 3 -N concentrations at the outlet of the MRB.

:
Comparisons of simulated results versus different signals of observed data for CODMn concentrations using different machine learning algorithms; Figure S2: Comparisons of simulated results versus different signals of observed data for NH 3 -N concentrations using different machine learning algorithms; Figure S3: Temporal variations of simulated results versus different signals of observed data for COD Mn concentrations using different machine learning algorithms.Red lines represent simulated results and blue circles represent observed data; Figure S4: Temporal variations of simulated results versus different signals of observed data for NH 3 -N concentrations using different machine learning algorithms.Red lines represent simulated results and blue circles represent observed data Author Contributions: Conceptualization, C.L. and Y.H.; methodology, C.L. and Y.H.; investigation, C.L. and F.S.; data curation, Y.H. and W.W.; writing-original draft preparation, C.L.; writing-review and editing, C.L., Y.H., F.S., L.M., Y.W. and H.Z.; supervision, B.L. All authors have read and agreed to the published version of the manuscript.Funding: This work was supported by the National Nature Science Foundation of China (No. 42107096 and 41807407), the National Key Research and Development Program of China (No. 2020YFC1808300), and the Key Research and Development Program of Sichuan Province (No. 2021YFQ0067).

Table 1 .
Methods of measurements of water quality indices at the Liangjianggou Station.

Table 2 .
Meteorological station information in the MRB with the collected data used in the present study.

Table 3 .
Pollutant load data in the MRB (ten thousand tons per year).

Table 4 .
The determination coefficients of different training models.