Long-Time Water Quality Variations in the Yangtze River from Landsat-8 and Sentinel-2 Images Based on Neural Networks

: Total phosphorus (TP) and total nitrogen (TN) represent the primary water quality parameters indicative of the eutrophication status in the mainstream of the Yangtze River. Nowadays, satellite remote sensing offers an economical and efﬁcient method for monitoring the water environment with a broad geographical scope, while single satellite and traditional methods are still limited. In this paper, inversion models of TN and TP are constructed and evaluated based on the neural networks (NNs) algorithm and random forest (RF) algorithm in the upper, middle, and lower reaches of the Yangtze River, respectively. Subsequently, the monthly variations of TN and TP concentrations are estimated and analyzed in the mainstream of the Yangtze River using Landsat-8 and Sentinel-2 satellites images from January 2016 to December 2022. The results show that the NNs model exhibits better estimation performance than the RF model within the study area. The accuracy of the TN model varies across different sections, with R 2 values of 0.70 in the upstream, 0.67 in the midstream, and 0.74 in the downstream, accompanied by respective RMSE values of 0.21 mg/L, 0.21 mg/L, and 0.23 mg/L. Similarly, the TP model exhibits varying accuracy in different sections, with R 2 values of 0.71 in the upstream, 0.69 in the midstream, and 0.78 in the downstream, along with corresponding RMSE values of 0.008 mg/L, 0.012 mg/L, and 0.008 mg/L. From 2016 to 2022, the concentrations of TN and TP in the mainstream of the Yangtze River exhibited an overall downward trend, with TN decreasing by 13.7% and TP decreasing by 46.2%. Furthermore, this study also gives the possible causes of water quality changes in the mainstream of the Yangtze River with a speciﬁc focus on hydrometeorological factors.


Introduction
The management and protection of water resources have been a major topic of concern worldwide.The environmental monitoring and water quality assessment of river systems are key steps in ensuring the sustainable use of water resources.The Yangtze River, as the largest river in China, holds an important geopolitical, economic, and ecological position.It not only provides drinking and irrigation water for millions of people but also plays an important role in transportation and trade routes.In addition, the Yangtze River Basin supports diverse ecosystems and biodiversity, which have important implications for China's sustainable development.Therefore, the monitoring and protection of the water quality and ecology of the Yangtze River is of strategic importance.As social and economic progress, phosphorus and nitrogen pollution have become the primary factors on water quality in the Yangtze River Basin.Nitrogen and phosphorus increase algal growth, and although they do not directly affect the spectra parameters, they can lead to the eutrophication of water bodies, ultimately harming the health of riverine ecosystems.Monitoring river water quality and exploring eutrophication mechanisms are of great significance for the management, control, and treatment of water bodies.The main target for assessing the environmental status of rivers, lakes, groundwater, and coastal waters is the regular detection of water pollutants and the causes of their presence [1].Satellite remote sensing technology has been developing rapidly in recent years and gradually applied to the field of water quality monitoring.From qualitative remote sensing to quantitative remote sensing, the accuracy of remote sensing inversion of water quality has been steadily improving, showing great potential for large-scale water quality monitoring.In water pollution monitoring, remote sensing technology can rapidly identify pollutant sources and provide reliable scientific guidance for prompt issue resolution.Remote sensing monitoring has distinct advantages, including prolonged observation periods, extensive geographical coverage, rapid monitoring cycles, and low cost, which effectively compensate for the limitations of conventional monitoring methods.Currently, remote sensing inversion for optical properties related to water quality parameters is gradually maturing.Water quality inversion methods can be broadly categorized into four primary types: empirical, analytical, semi-empirical, and artificial intelligence methods.The empirical method develops a statistical regression model that correlates with measured water quality parameters and reflectance in the optimal band or combination of bands [2].Basic empirical methods make it difficult to meet the accuracy requirements for estimating water quality parameter concentrations, because the relevant spectral features are affected by a complex mix of water quality variables, such as phytoplankton pigments, colored dissolved organic matter (CDOM), etc.This complexity poses significant obstacles to the process of water quality inversion [3].Empirical methods tend to have limited generalizability due to their dependence on specific regional and temporal conditions.In contrast, analytical methods employ bio-optical models and radiative transfer models to simulate the propagation of light through both the atmosphere and water column.This approach characterizes the intricate relationship between water quality parameters and radiation or reflectance.Gilerson et al. [4] evaluated the performance of bio-optical inversion models in estimating the chlorophyll concentration using comprehensive synthetic datasets of reflectance spectra and intrinsic optical features associated with various water quality parameters, as well as field datasets.However, the complex composition of water bodies and radiative transfer processes necessitates the consideration of numerous factors, and this method relies on a substantial foundation of mathematical and physical principles.Additionally, the spectral resolution of most satellite sensors often does not align with the spectral resolution of near-ground measurements.Therefore, achieving quantitative remote sensing inversion based on the analytical model is still challenging and limited in practical applications [5].
The semi-empirical method is a combination of the empirical and analytical methods.Semi-empirical methods can be well modeled by incorporating a limited set of field data and reflectance or radiance values.Some researchers have employed this method for water quality inversion with improved accuracy.For example, Peter et al. [6] inverted chlorophyll a and phycocyanin using four semi-analytical algorithms from data acquired by CASI-2 (Compact Airborne Spectrographic Imager-2) and AISA (the Airborne Imaging Spectrometer for Applications), verifying the great potential of semi-analytical algorithms for remote sensing inversion.In general, the limited optical properties of total nitrogen (TN) and total phosphorus (TP) make it difficult to capture the straightforward linear relationship between concentrations of water constituents and spectral reflectance or radiance.However, these constituents do influence the spectral characteristics of the water body.When the underlying relationships of obtained data are difficult to describe, neural network (NN) models still work [7].Therefore, it becomes imperative to enhance the accuracy of inversion through the incorporation of artificial intelligence methods in remote sensing.Artificial intelligence methods differ from the above three modeling methods by their ability to provide explicit equations that can capture both complex linear and nonlinear relationships between features and target variables.A large number of studies have been conducted on artificial intelligence methods, such as neural networks (NNs) and support vector machines (SVMs), for water quality inversion.Guo et al. [8] compared three machine learning models: random forest (RF), SVMs, and NNs for inversion of the TP, TN, and chemical oxygen demand.Their results showed that the optimized machine learning model and image band selection have significantly improved the performance of these non-optically active parameters' inversion.
Numerous scholars have extensively studied the remote sensing inversion of TN and TP.For instance, a water quality retrieval model of TP and TN was established and evaluated using the Landsat 5 Thematic Mapper (TM) data and multiple regression algorithms, where the TP concentration was retrieved within a 30% mean relative error and the TN concentration was retrieved within 20% [9].Their results showed that the most traditional routine monitoring of water quality by remote sensing was possible and effective.Baban [10] applied regression equations to predict water quality parameters using TM data, including chlorophyll and TP.Isenstein et al. established a remote sensing inversion model for TP and various other water quality parameters in Lake Champlain.Their approach involved a multiple linear regression model that incorporated Landsat's ETM+ data in conjunction with measured data [11].Yue et al. employed IKONOS data to construct a multiple linear regression model with two neural network models and inverted four water quality parameters in Huangshi Ci Lake, which included TN and TP, indicating that the neural network model produced more accurate inversion results when compared to the multiple linear regression model [12].Liu et al. [13] extracted 16 spectral parameters from multispectral images and constructed several inversion models for TN, SS, and TUB water quality parameters in the East Lake of Zhejiang.Although the R 2 of the exponential regression function model for TP inversion reached 0.7829, the model just provided a valuable reference for guiding pollution control in small watersheds.Sun [14] applied twelve machine learning algorithms to estimate the TN and TP concentrations in the Miyun Reservoir and analyzed the changes in the four water quality parameters at the spatial and temporal scales, along with classifying the range of water quality fluctuation.The findings indicated that the Miyun Reservoir consistently maintained a high overall water quality while achieving Class II water quality standards throughout the year.Du et al. conducted TP inversion in Taihu Lake using GOCI images and regression analysis, which achieved a peak accuracy of 0.898 and revealed variations in the TP concentrations across different seasons [15].With the development of computer technology, machine learning methods are applied in the field of remote sensing inversion.For instance, Zhang et al. developed a new Bayesian probabilistic neural network model and employed it to quantitatively predict water quality parameters such as TN, TP, etc.The model demonstrated high inversion accuracies, achieving R 2 values of over 0.9.This advancement facilitated extensive-scale water quality monitoring and pinpointed pollutant sources in the Maozhou River in Guangdong Province, China [16].He et al. inverted the TN and TP parameters of a large inland reservoir in Jiangmen City, South China, based on the inversion model of the BP neural network with Landsat-8 images and evaluated the eutrophication of the reservoir [17].Karamoutsou and Psilovikos [18] tested feed-forward deep neural networks (FF-DNNs) of dissolved oxygen (DO) with different structures, and all the well-trained DNNs gave satisfactory results.At present, the short-term prediction of water quality parameters is relatively mature.For example, Sentas et al. [19] evaluated the short-term prediction capabilities of ARIMA, transfer function (TF), and artificial neural network (ANN) for the water body parameters of Thesaurus Dam, River Nestos, Greece.Some scholars have also conducted inversions of water quality in the mainstream of the Yangtze River.For instance, an empirical (regression-based) model was proposed, and the TN and TP concentrations were retrieved and analyzed in the Yangtze River [20].Zhao et al. proposed a joint inversion model based on Landsat-8 and Sentinel-2 and found that the accuracy of multi-source satellite inversion was higher than that of a single satellite [21].However, only one model was assumed for the whole mainstream of the Yangtze River, while the water qualities in the upper, middle, and lower reaches are different.
The aim of this work is to quantify and analyze the long-term changes in key water quality parameters of the Yangtze River in order to better understand the trends in water quality changes and their impacts on the ecosystem.In this work, inversion models of TN and TP in the upper, middle, and lower reaches of the Yangtze River are constructed and evaluated from Landsat-8 and Sentinel-2 remote sensing images based on two machine learning regression methods: the random forest method and the neural network method.Subsequently, the more effective model is used to invert the TN and TP concentrations in the mainstream of the Yangtze River from January 2016 to December 2022.The spatial and temporal patterns of the water quality changes are further studied, and the possible mechanisms in the nutrient status within the mainstream of the Yangtze River are discussed.In Section 2, the data and methods are shown, the results and analysis are presented in Section 3, the impacts and discussion are given in Section 4, and finally, the conclusions are summarized in Section 5.

Study Area
The Yangtze River is the largest river in China.The mainstream of the Yangtze River runs from west to east, between 90 • E-122 • E and 24 • N-35 • N, with a total length of 6363 km.As shown in Figure 1a, the Yangtze River originates from the southwest side of the Goladandong Peak in the Tanggula Mountains on the Tibetan Plateau.Its mainstream flows through 11 provincial-level administrative regions, including Qinghai Province, Tibet Autonomous Region, Sichuan Province, Yunnan Province, Chongqing Municipality, Hubei Province, Hunan Province, Jiangxi Province, Anhui Province, Jiangsu Province, and Shanghai Municipality.Finally, it empties into the East China Sea from the east area of Chongming Island [20].The Yangtze River serves as the primary water source for numerous cities.Any deterioration in the water quality of its mainstream could affect the regular water supply for major cities, potentially constraining the economic development of the Yangtze River Economic Zone.
The Yangtze River has a large basin area, which is characterized by a complex geographical environment.The color and transparency of water bodies are greatly affected by environmental factors, such as meteorological conditions, lighting conditions, pollution sources, etc.These factors can cause substantial interference in remote sensing data, consequently affecting the accuracy of remote sensing water quality inversion.Therefore, in this study, we divided the mainstream of the Yangtze River into distinct regions and conducted independent remote sensing water quality inversion for each region.This approach allows a more precise inversion of the water quality in each specific area.Based on geographical location, we divided the mainstream of the Yangtze River into three different regions: namely, the upper, middle, and lower reaches, as shown in Figure 1.These divisions take into account potential variations in the pollution sources, meteorological conditions, and lighting conditions.In this study, the upper reaches of the Yangtze River mainstream refers to the area along the Yangtze River Basin from Yibin, Sichuan, to Yichang, Hubei.This region is characterized by mountain canyons and baling landforms, with fast-flowing and turbulent river channels.The middle reaches refers to the section of the Yangtze River Basin from Yichang, Hubei, to Jiujiang, Jiangxi.Here, the terrain is gentler, and the river widens.The main tributaries in this area include Dongting Lake, Poyang Lake, Chao Lake, etc., and the water flow is relatively gentle.The lower reaches refers to the section of the Yangtze River Basin from Jiujiang, Jiangxi, to the Yangtze Estuary.This area, centered around the Yangtze River Delta, has the widest rivers with slower flow rates, intertwining with a complex network of waterways.

Data 2.2.1. Surface Water Quality Data
Two sources of surface water quality data were used in this study: (1) national automatic surface water quality monitoring station data released by the China Environmental Monitoring General Station (https://szzdjc.cnemc.cn:8070,accessed on 3 March 2023) from 2019 to 2022.This dataset contains TN and TP, water temperature, PH, dissolved oxygen, conductivity, turbidity, and other monitoring indicators.A total of 65 state-controlled hydrological stations along the mainstream of the Yangtze River were screened, and their distribution is shown in Figure 1b.The data are released in real time every 4 h, including 137 groups in the upper reaches, 171 groups in the middle reaches, and 267 groups in the lower reaches.This dataset exhibits a more uniform temporal and spatial distribution, which improves the accuracy of the inversion model.( 2) Field sample observation data were collected from the mainstream section of the Yangtze River in December 2021, March 2022, and October 2022, and the distribution of the sampling locations is shown in Figure 1c.The sampling process was in strict accordance with the Technical Specifications Requirements for Monitoring of Surface Water and Waste Water [22].The determination of the water quality parameters was carried out in strict accordance with the Environmental Quality Standards for Surface Water [23].The precision and accuracy of the data met the requirements of the national standards for water quality monitoring methods.Specifically, data were used with 10 groups in the upstream, 41 groups in the middle reaches, and 17 groups in the downstream.Among them, TN was measured by the persulfate oxidation method, in which potassium persulfate decomposed in the aqueous solution above 60 • C, generating hydrogen ions and oxygen, and sodium hydroxide was added to neutralize the hydrogen ions so that potassium persulfate was decomposed completely.Under the condition of an alkaline medium at 120-124 • C, using potassium persulfate as the oxidizing agent, it can not only oxidize ammonia nitrogen and nitrite to nitrate in the water samples but also oxidize most of the organic nitrogen compounds to nitrate in the water samples.Its absorbance was measured at 220 nm and 275 nm, respectively, with a spectrophotometer to calculate the absorbance value of nitrate nitrogen to calculate the content of nitrogen [24].Total phosphorus was determined spectrophotometrically using ammonium molybdate, and the specimen was digested under neutral conditions using potassium persulfate as an oxidizing agent to oxidize all the contained phosphorus to orthophosphate [25].In an acidic medium, orthophosphate reacts with ammonium molybdate to produce phosphomolybdenum heteropolyacids in the presence of antimony salts, which are immediately reduced by ascorbic acid to produce blue complexes, and the absorbance was determined at 700 nm using a light-range 30 mm cuvette with distilled water as a reference.The limit of detection was 0.01 mg/L, and the lower limit of determination was 0.04 mg/L for a sample volume of 25 mL.A WTW photoLab 7100 visible spectrophotometer was used.

Remote Sensing Image Data
Remote sensing image data were used from Landsat-8 and Sentinel-2.Landsat-8, a land observation satellite jointly operated by the National Aeronautics and Space Administration (NASA) and the United States Geological Survey (USGS), was launched on 11 February 2013 and provides high-quality remote sensing image data on a global scale.The satellite carries two sensors, the OLI Land Imager and the TIRS Thermal Infrared Sensor.Landsat-8 has a total of 11 bands, with 30 m spatial resolution in bands 1-7 and 9-11 and 15 m resolution panchromatic in band 8.The satellite can achieve global coverage every 16 days.Landsat-8 imagery has a wide range of applications in many fields, including land use and land cover monitoring, agriculture, forest resource management, water resource management, environmental monitoring, and so on.Landsat-8 imagery data are freely accessible, and here, we obtained Landsat-8 imagery from USGS's Earth Explorer.The Sentinel-2A satellite was launched on 23 June 2015 carrying the Multispectral Imaging (MSI) payload.The MSI sensor consists of 13 bands categorized into three sections: visible, near-infrared, and short-wave infrared, with a central wavelength spanning from 490 to 2190 nm.The advantages of the Sentinel-2A satellite are a shorter revisit period and high spatial resolution, allowing for a more accurate inversion of the river water quality.Water quality extraction methods can be broadly categorized into the single-band method, multi-band spectral interrelationship method, water body index method, and image classification method.The study area has a complex distribution pattern of water bodies and drastic variations in their boundaries.To address this, we utilized Sentinel-2 MSI remote sensing images as the original data.These Sentinel-2 data underwent a series of preprocessing steps, including scene categorization, atmospheric correction, product format conversion, and parallel processing.These preprocessing tasks were accomplished using the Sen2Cor plug-in within the ESA's image processing system known as the Sentinel Application Platform (SNAP).
WI 2006 , developed by Danaher et al. [26] in 2006, was designed to analyze the reflectance of the atmospheric surface layer using standard variables.It uses the natural logarithm of each band in the Landsat-7 ETM+ image to reflect the reflection coefficient and interaction conditions.This index has been applied to wetlands in Eastern Australia.In 2015, based on WI 2006 , Fisher et al. [27] created a new water index called WI 2015 .For our study, we employed WI 2015 as the water body index for extracting the water bodies in the Yangtze River.The expression for WI 2015 is as follows: where GREEN represents the green light band, RED represents the red light band, N IR represents the near-infrared band, and SW IR1 and SW IR2 represent the short-wave infrared bands.

Image Fusion
In this study, the ACOLITE algorithm is used for the atmospheric correction of Landsat-8 and Sentinel-2 images, followed by radiometric calibration and cropping of the images to ensure the same range and size.Tables 2 and 3 show the detailed information for Landsat-8 and Sentinel-2 bands with wavelengths less than 1000 nm. Figure 2 shows the spectral response function diagram of Landsat-8 OLI and Sentinel-2 MSI for wavelengths less than 1000 nm.To account for the differing resolutions between Sentinel-2 and Landsat-8 bands, we resampled the Sentinel-2 image using bicubic interpolation.This method considers linear pixel relationships and includes additional neighboring pixels when compared to the bilinear interpolation, ensuring the better preservation of image details and smoother transitions.Figure 3a shows the original Sentinel-2 image in the Nanjing section of the Yangtze River. Figure 3b shows the resampled image, and Figure 3a,c shows the Landsat-8 image.3d.The PCA method initially performs the principal component analysis on the original multispectral data, calculating each principal component based on the eigenvalues and eigenvectors derived from the correlation matrices between the spectral bands.The first four principal components are selected for fusion, and the multispectral principal components of Landsat-8 and Sentinel-2 are inverted to generate two multispectral images.The panchromatic image is then fused with the multispectral inverse transformed image.The fused image now contains information from both Landsat-8 and Sentinel-2, with a higher information content and visual quality for further analysis, visualization, or specific applications.PCA can convert n-dimensional data into m-dimensional data, where the m-dimensional data are called principal components.Since principal components capture the most significant variance and represent key features of the original data, we can use it to eliminate redundant areas of similarity between two image datasets while preserving distinctive features.In this study, we prioritized spectral bands with wavelengths under 1000 nm, as they exhibit greater sensitivity to water quality inversion.Specifically, bands 1, 2, 3, 4, and 5 of Landsat-8 and bands 1, 2, 3, 4, and 8A of Sentinel-2 were selected to form a new image.These bands cover the visible and near-infrared wavelengths, which have a high absorption and scattering effect on the water column and, therefore, provide valuable information in water quality inversion.

Construction of the Water Quality Inversion Model (1) Feature Selection
To extract more useful information, feature construction is first performed, where new features are constructed using the existing feature set and added to the existing feature set.A full search approach is used to find the best combination of all possible features in all band combinations, such as common band ratios, normalized difference indices, etc.
In the next step, feature selection is carried out to simplify the model, increase its interpretability, and reduce overfitting.A correlation analysis is performed based on the Pearson coefficient ρ, as shown in Formula (2): where ρ is the Pearson coefficient of two groups of variables X and Y, and n is the number of samples.The correlation coefficient between each group of characteristics and water quality parameters is calculated.Characteristics exhibiting a correlation higher than 0.3 are selected to establish an inversion regression model for water quality parameters in the mainstream of the Yangtze River.
To eliminate the influence of dimensions and value ranges between different indicators, it is necessary to normalize the features and scale the data within a specific range to enable a comprehensive analysis.This study employs the range standardization method to linearly transform the original data and map the values to [0, 1], as shown in Formula (3): (2) Model establishment The machine learning method is an emerging approach to break through the limitations of empirical regression models in traditional proportion design by constructing a model through extensive learning of the original features of the dataset, supplemented by manual empirical adjustments.This method is currently being paid attention to and applied in the field of water quality parameter inversion.The primary algorithms include neural networks, support vector machines, decision trees, etc. Machine learning has better applicability in the complex nonlinear fitting process involved in remote sensing water quality inversion.It reduces human intervention and, through continuous learning, selects a model suitable for simulating the complex relationship between remote sensing data and the lake water quality parameters.The resulting inversion model is characterized by robustness, small errors, and a good prediction effect.
In this work, 20% of the data is used as test data, and 80% of the data is used as training data.A three-layer neural network consists of an input layer, hidden layer, and output layer.In this structure, the input layer utilizes the band features as feature vectors, and the number of neurons in the input layer should match the number of features.The hidden layer is nonlinearly transformed by the Sigmoid activation function, and the output layer is the target water quality parameters.Mean squared error (MSE) was used as the loss function, and Stochastic gradient descent (SGD) was chosen as the optimizer for tuning the model parameters to minimize the loss function.The neural network model was trained using the training set data.In each training iteration, the input features were provided to the model, the predicted values were calculated, the loss was computed, and then, the optimizer was used to update the model parameters.This process was repeated several times until the model converged or reached a predetermined number of training sessions.Random forest regression model is a machine learning algorithm based on decision tree integration for making predictions and modeling regression problems.It combines the advantages of decision trees and improves the predictive performance and stability of the model by integrating multiple decision trees.
(3) Accuracy evaluation Mathematical modeling of water quality inversion is to express the relationship between the actual spectral reflectance and the concentrations of the water quality parameters into a mathematical problem, and model checking is to observe whether it is consistent with the actual situation.In this study, the coefficient of determination (R 2 ) and root mean square error (RMSE) are used as the evaluation indicators of the model.R 2 is a measure that indicates how well a model fits the observed data.It assesses how well the model explains the observed data and what percentage of the variance in the observed data is explained by the model.RMSE provides a measure of the error between the model's predictions and the observations.It assesses the gap between the model's predictions and the actual observations and how large is that gap.This is important for water quality inversion models.The combined use of R 2 and root mean square error (RMSE) helps to assess many aspects of a model's performance, including goodness-of-fit and error measures.R 2 , known as the degree of fit, denotes the proportion of the total variation explained by the independent variables.The calculation is as Formula (4): where n is the total number of samples y, y i is the sample value of y, ŷi is the predicted value corresponding to y i , and y is the average value of y.The range of R 2 is [0, 1].A higher value of R 2 indicates a better goodness of fit, indicating a stronger relationship between the dependent variable and independent variables and a larger proportion of the total change attributed to the independent variable.RMSE, the root mean square error, is calculated as the square root of the average of the squared differences between the predicted values and true values.It serves as a metric for quantifying the deviation between observed values and their corresponding true values.The calculation is expressed as Formula ( 5): where n is the total number of samples y, y i is the sample value of y, ŷi is the predicted value corresponding to y i , and the range of the RMSE is [0, +∞).The larger the RMSE value is, the greater the error of the model.A RMSE of 0 indicates that the predicted values perfectly match the true values.

Model Establishment and Evaluation
In model construction, both the RF regression algorithm and the NNs regression algorithm were used.The accuracy metrics for each model are presented in Tables 4 and 5.Among them, the NNs model outperformed the RF model in the inversion of the TN and TP parameters for the entire Yangtze River.When using the RF model to build the Yangtze River TN and TP model, the R 2 value of the TN model in the upper reaches of the Yangtze River is the highest (exceeded 0.7), while other models fell within the range of 0.6-0.7.However, when using the NNs regression model to build the TN and TP inversion model, only the TN model and the TP model in the middle reaches of the Yangtze River yield an R 2 below 0.70, while the others are over 0.7.For models with the same parameters in the same area of the Yangtze River, the RMSE of the NNS model is lower than that of the RF.An analysis of the index evaluation results for both the two models of NNs and RF shows that the NNs model yields better estimates for the water quality parameters when compared to the RF model.Since the NNs model shows better performance than the RF model in the study area, the subsequent spatial inversion of the water quality parameter concentrations in the mainstream of the Yangtze River relies on the NNs water quality parameter model.In order to further evaluate the usability of the model, the model results and the measured results are visualized based on calculating the evaluation indicators.These visualization results are shown in Figure 4.

Monthly Changes
The spatial distribution of TN and TP for each month between January 2016 and December 2022, at the water body scale, was obtained using the model parameters from the NNs model validation and inputting the values for each band of the water body.Figure 5 shows the time series of TN and TP for each month between January 2016 and December 2022.Overall, the monthly average values of TN and TP in the Yangtze River have a declining trend.Specifically, TN decreased by 13.7%, while TP experienced a more substantial drop of 46.2%.The variations in TP were more pronounced than those in TN.
From January 2016 to December 2022, the monthly average TN levels in the mainstream of the Yangtze River exhibited a seasonal pattern in one year and an overall downward trend.Notably, winter exhibited the lowest TP content throughout the year, while TP reached its peak during the summer months.Unlike TN, TP did not exhibit obvious seasonal patterns and instead demonstrated a consistent overall decrease with the time.
The Government of China adopted a series of stringent environmental policies and control measures during this period aimed at reducing pollutant emissions, including nitrogen emissions.This may include the controlling of industrial and agricultural discharges, upgrading of wastewater treatment facilities, and strengthening of environmental regulations.These policies and measures have helped to reduce the total nitrogen emissions and improve water quality.A preliminary analysis shows that the reduction of pollution in the Yangtze River and the treatment of water pollution are the main reasons for the decrease in the total phosphorus concentration.It shows that, with the deepening of TP pollution control, the effect of water quality improvement is more obvious [28].

Annual Changes
We applied the well-established TN and TP inversion model to process remote sensing images in the mainstream of the Yangtze River, and monthly TN and TP concentrations were obtained in the mainstream of the Yangtze River from 2016 to 2022.In the spatial distribution map, the region with pixels classified as water bodies more than eight times in a year is designated as the annual water body area.Subsequently, the spatial distribution of the annual TN and TP parameters is obtained based on the corresponding year.Overall annual changes of the water quality in the Yangtze River show a downward trend in both the TN and TP from 2016 to 2022 (Figure 6).Specifically, the TN concentrations exhibited an initial increase from 2016 to 2017, followed by a slight decline from 2017 to 2019, a significant drop in 2020, a subsequent increase in 2021, and another decrease in 2022.Meanwhile, the TP concentrations displayed a decreasing trend from 2016 to 2020.However, from 2020 onward, the trend in the TP and TN both showed an increase and then decrease.

Impacts and Discussion
Many factors affect the water quality of the Yangtze River, including temperature, water level, flow, and rainfall.Based on the results from our water quality model inversion, we analyzed the monthly variations in concentrations of TN and TP across six sections of the Yangtze River, including Chongqing, Yichang in Hubei, Yueyang in Hunan, Wuhan in Hubei, Jiujiang in Jiangxi, and Anqing in Anhui (Figure 7).We then conducted a correlation analysis between the monthly averaged temperature, water level, flow, and rainfall obtained at six sections and the monthly values of TN and TP in the Yangtze River, and the Pearson correlation coefficient was selected as the indicator (Figure 8).Temperature directly affects the activities of organisms in water and the health of ecosystems.Differences in temperature affect the density and flow of a water body, and temperature also affects the rate of chemical reactions in the water body, such as the conversion of nitrogen and phosphorus, which may have an effect on TN and TP concentrations.Temperature is often one of the factors that must be considered in water quality assessment and management to ensure the health and sustainability of water bodies.
From 2016 to 2022, the average temperatures in six typical sections were 18.5 • C, 16.9 • C, 18.2 • C, 18.0 • C, 18.1 • C, and 17.6 • C, respectively.As shown in Figure 8, there is a negative correlation between temperature and TN concentration in various sections of the Yangtze River mainstream across different months.The negative correlation is particularly obvious in the Yueyang section, with a value of ρ of −0.28.Specifically, as the temperature increases, the TN concentration shows a downward trend.This negative correlation suggests that temperature may impact aquatic ecosystems, including biological activities and chemical transformation processes of nitrogen.High temperatures may promote microbial activities in the water, potentially leading to increased bioabsorption and degradation rates of TN and ultimately reducing the TN concentration.On the contrary, a positive correlation was observed between the temperature of the transect and TP.Higher water temperatures may foster the growth of algae and bacteria, thereby increasing the bioabsorption and release rates of TP.The TP concentration in the Chongqing section, the Yichang section, and the Yueyang section is most affected by temperature, and the changing trend is the same as that of the temperature (Figure 9).This shows that, in the upper reaches of the Yangtze River, the TP concentration in winter is significantly lower than that in the summer.The TN and TP concentrations in the middle and lower reaches of the Yangtze River are less affected by temperature than those in the upper reaches of the Yangtze River.The main reason may be that the upper reaches have greater seasonal temperature differences.

Water Level
The water level is relevant to the water quality of the Yangtze River, because it is one of the important physical parameters of the river and has direct and indirect effects on water quality.Rising and falling water levels can affect the solubility of oxygen in the water column.Changes in the water level can affect wastewater discharge and the dilution of wastewater.Changes in the water level of the Yangtze River can affect the functioning of wetlands and ecological habitats around the water body, which, in turn, affects the water quality.
The water level of the mainstream of the Yangtze River changes seasonally, reaching the highest value in the summer.As shown in Figure 8, the correlation between the TN parameters and water level data in the six sections of the Yangtze River is low, with ρ values consistently below 0.2.Among them, the correlation between the TN parameters and the water level in the Jiujiang section is the lowest, and the correlation between the TN parameters and the water level in the Anqing section is the highest.The water level changes at Anqing Station are relatively stable, with a water level difference of about 10 m within a year.The TN content at Anqing Station was also relatively stable before 2019 (Figure 10).However, this study reveals that the water level of the Yangtze River has a greater impact on the TP when compared to the TN.The Yichang section exhibits the highest correlation coefficient between the TP and water level, with a value of ρ of 0.3, and the lowest is in the Yueyang section, with a value of ρ of 0.1.The concentrations of the TN and TP show a weak positive correlation with the water level.Although the direct relationship between water level and TN and TP is limited, changes in the water level may indirectly affect the water quality by factors such as hydrodynamics and dissolved oxygen in the water body.In addition, the rise in water level during floods will lead to an increase in TN and TP content, especially in the Jiujiang section.As shown in Figure 10e, in the summers of 2016 and 2020, the water level in the Jiujiang section of the Yangtze River rose while the TN and TP rose sharply.This phenomenon may be attributed to the flooding, which caused erosion and overflow of the adjacent land and introduced particulate matter rich in TN and TP into the water body.In situations where the mainstream of the Yangtze River experiences high water levels and a weak water cycle, the input rate of TN and phosphorus may exceed the treatment and discharge capacity of the water body, resulting in the accumulation of TN and TP.Complex hydrological conditions exist in the mainstream of the Yangtze River, including seasonal flood and drought cycles.These hydrologic changes may mask potential associations between water levels and TP, as flooding can cause large fluctuations in TP, and TP concentrations during droughts may be constrained by other factors.

Flow
The radial flow in the Yangtze River is intrinsically linked to the concentrations of TN and TP within its waters.There is a strong correlation between flow and water quality in the Yangtze River due to the fact that the flow is a key controlling factor in river hydrodynamics and water quality dynamics.High flows can accelerate pollutant transport and flushing, which results in pollutants being flushed from the surface and riverbanks into the water column, negatively impacting the water quality.Therefore, changes in flow can play a key role in the transport and distribution of pollutants.To investigate the relationship between radial flow and TN and TP concentrations in the mainstream of the Yangtze River, we compared and analyzed the monthly average radial flow and water quality parameter concentrations in six Yangtze River sections from January 2016 to December 2022.Figure 8c illustrates the correlation analysis at the six selected sections between flow and TN concentration in the Yangtze River.There is a positive correlation between flow and TN concentration.Notably, the Chongqing section exhibits the highest correlation coefficient with a value of ρ of 0.36, indicating a significant association between flow and TN.In addition, a sharp increase in flow will cause the TN concentration to rise rapidly.For instance, the flow of the Chongqing section increased sharply in the summer of 2018 and 2020, and the total nitrogen content also increased in a short period (Figure 11).This outcome underscores the influence of radial flow on the TN concentration within the river.Our findings reveal that an augmentation in radial flow is generally accompanied by an increase in TN concentration.This observation suggests that radial flow may exert an impact on nitrogen cycling within aquatic ecosystems.With an elevated radial flow, it is plausible that more nitrogen species are transported from the watershed into the river, resulting in elevated TN concentrations.
Moreover, a discernible correlation between radial flow and TP is evident.Notably, the correlation coefficients between radial flow and TP in the Chongqing, Yichang, and Anqing sections all exceed 0.2.This phenomenon implies that radial flow may play a substantial role in the transport of TP within the water column.Increased flow over a short period of time can lead to a sharp increase in TP concentrations.For example, in the summer of 2020, the flow in the Yueyang and Jiujiang sections increased, and the TP concentration increased significantly (Figure 11).It is imperative to acknowledge that the relationship between radial flow and TP exhibits seasonal and spatial variations.During flood events, radial flow increases, potentially leading to a greater influx of TP into the water column and, subsequently, higher TP concentrations.Conversely, in the dry season, when radial flow diminishes, a corresponding decrease in TP concentrations may ensue.

Rainfall
The amount of rainfall is related to the water quality of the Yangtze River, because rainfall is one of the important environmental factors that affect the water quality of water bodies.Rainfall can wash waste, asphalt, and other pollutants from the surface and carry them into the river.Heavy rainfall or flooding events can introduce large amounts of wastewater, sediment, and pollutants that can negatively impact the water quality.Rainfall can also increase the risk of soil erosion, transporting sediment and suspended solids into streams.This not only causes turbidity in the water but can also attach organic matter and pollutants to sediment particles, further affecting the water quality.Figure 12 presents the results of an analysis conducted on TN, TP, and rainfall data derived from six distinct sections in the Yangtze River.The rainfall in the middle and lower reaches of the Yangtze River is significantly more than that in the upper reaches of the Yangtze River.Our examination reveals a discernible correlation between rainfall and TN concentration within the Yangtze River, with pronounced effects observed particularly in the upper reaches of the river.The values of ρ between rainfall and TN in the Chongqing section and Yichang section are 0.36 and 0.33, respectively.After rainfall events, the TN concentration usually rises for a short period.This phenomenon may reflect the scouring of nitrogenous substances in the rainwater, which carries nitrogenous substances from the soil into the river.Therefore, rainfall events may be an important driver of short-term fluctuations in TN concentrations, and there is a positive correlation between rainfall and TN concentration.Among the six selected sections, except for Anqing, the correlation between rainfall and TP is lower than that of TN.However, like TN, the impact of surface rainfall on TP in the upper reaches of the Yangtze River is higher than that of the middle and lower reaches of the Yangtze River.The values of ρ between rainfall and TP in the two upstream sections Chongqing and Yichang are 0.22 and 0.30, respectively.It was also found that the relationship between rainfall and TP showed seasonal and spatial variations.In the summer, the increase in rainfall may lead to more TP transported from the watershed to the river, thus increasing the likelihood of elevated TP concentrations.In contrast, during the dry season, the impact of rainfall on TP concentrations tends to be less pronounced.

Conclusions
Based on Landsat-8 and Sentinel-2 satellite images, this study undertook the inversion of monthly TN and TP across the entire mainstream of the Yangtze River.The mainstream of the Yangtze River was divided into three sections according to geographic location: namely, the upstream, midstream, and downstream.In each region, a remote sensing inversion model was established, incorporating principal component analysis for image fusion.To construct the inversion model, both RF regression and NNs regression methods were used and assessed.NNs regression exhibited superior suitability for the inversion of the water quality parameters in the mainstream of the Yangtze River.Employing the NNs regression approach, comprehensive datasets for the spatial distribution of the TN and TP concentrations in the Yangtze River mainstem were derived.The results demonstrated that the neural network regression method was more suitable for the inversion of TN and TP water quality parameters in the mainstream of the Yangtze River, with an R 2 value exceeding 0.67 for all models.The NNs model was applied to accurately invert the time series of TN and TP changes with the fused images in the Yangtze River mainstem from January 2016 to December 2022.This analysis revealed a decreasing trend in monthly mean values for TN by 13.7% and TP by 46.2% over this period.
In addition, an examination of the relationship between the monthly average TN and TP concentration changes and hydrometeorological factors such as temperature, water level, flow, and rainfall was conducted in six typical sections of the Yangtze River mainstem.
The analysis results indicated that the upper reaches of the Yangtze River exhibited higher sensitivity to hydrometeorological factors than the middle and lower reaches, with rainfall and flow exerting more significant impacts on water quality in the mainstream.Among them, TN has a weak negative correlation with temperature and a positive correlation with water level, flow, and rainfall.TP has a positive correlation with temperature, water level, flow, and rainfall.
This study provides insights into the complexity and long-term trends of water quality problems in the Yangtze River.It not only is important for the management of the Yangtze River Basin, but also provides valuable references for the management of other river systems and global ecosystems.

Figure 1 .
Figure 1.Studied area of the Yangtze River mainstream (a) with national monitoring points (b) and sample monitoring points (c).
2.2.3.Other Data ERA5 data are the fifth generation of ECMWF (European Centre for Medium Range Weather Forecasts) global climate reanalysis data with a spatial resolution of 0.1 • × 0.1 • for month-by-month climate data (https://www.ecmwf.int/en/forecasts/datasets/reanalysisdatasets/era5,accessed on 3 March 2023).The monthly mean daily precipitation (in mm) and month-by-month temperature (in • C) were obtained from ERA5 for 2016-2022 in the study area.The hydrometeorological data used in this study include the water level, flow, and precipitation, as shown in

Figure 2 .
Figure 2. Spectral response function diagram of Landsat-8 OLI and Sentinel-2 MSI.Subsequently, we employ the principal component analysis (PCA) algorithm to fuse the Sentinel-2 and Landsat-8 images, thereby forming a new multi-band image.The concept of PCA fusion image is based on the statistical technique of dimension reduction.This technique achieves image fusion by converting the remote sensing images of multiple bands into a new set of independent bands, as shown in Figure 3d.The PCA method initially performs the principal component analysis on the original multispectral data, calculating each principal component based on the eigenvalues and eigenvectors derived from the correlation matrices between the spectral bands.The first four principal components are selected for fusion, and the multispectral principal components of Landsat-8 and Sentinel-2 are inverted to generate two multispectral images.The panchromatic image is then fused with the multispectral inverse transformed image.The fused image now contains information from both Landsat-8 and Sentinel-2, with a higher information content and visual quality for further analysis, visualization, or specific applications.PCA can convert n-dimensional data into m-dimensional data, where the m-dimensional data are called

Figure 3 .
Figure 3. Satellite images of the Nanjing section of the Yangtze River.(a,b) The Sentinel-2 images with 10 m and 30 m resolution, (c) the Landsat-8 image, and (d) the fused image.

Figure 4 .
Figure 4. Comparison of the predicted value and real value of the neural network model.(a,c,e) Comparison of the TN in the upstream, midstream, and downstream.(b,d,f) Comparison of the TP in the upstream, midstream, and downstream.

Figure 5 .
Figure 5.Time series of monthly TN and TP concentrations in the mainstream of the Yangtze River from January 2016 to December 2022.

Figure 6 .
Figure 6.Annual change trend of TN and TP in the mainstream of the Yangtze River.

Figure 7 .
Figure 7. Schematic diagram of the cross-section positions.

Figure 8 .
Figure 8. Correlation between (a) temperature, (b) water level, (c) flow, and (d) rainfall on the TN and TP concentrations in sections.

Figure 9 .
Figure 9. Relationship between TN and TP concentrations and temperature in the mainstream of the Yangtze River at (a) Chongqing, (b) Yichang, (c) Yueyang, (d) Wuhan, (e) Jiujiang, and (f) Anqing.The green line represents temperature, the red line represents TP, and the blue line represents TN.

Figure 10 .
Figure 10.Relationship between TN and TP concentrations and the water level in the Yangtze River at (a) Chongqing, (b) Yichang, (c) Yueyang, (d) Wuhan, (e) Jiujiang, and (f) Anqing.The green line represents the water level, the red line represents TP, and the blue line represents TN.

Figure 11 .
Figure 11.The relationship between the TN and TP concentrations and flow in the Yangtze River at (a) Chongqing, (b) Yichang, (c) Yueyang, (d) Wuhan, (e) Jiujiang, and (f) Anqing.The green line represents flow, the red line represents TP, and the blue line represents TN.

Figure 12 .
Figure 12.Relationship between the TN and TP concentrations and rainfall in the Yangtze River.(a) Chongqing, (b) Yichang, (c) Yueyang, (d) Wuhan, (e) Jiujiang, and (f) Anqing.The green line represents rainfall, the red line represents TP, and the blue line represents TN.

Table 4 .
Accuracy of the TN inversion models constructed by different algorithms in different reaches in the Yangtze River Basin.

Table 5 .
Accuracy of the TP inversion models constructed by different algorithms in different reaches in the Yangtze River Basin.