Next Article in Journal
Earth Fissures During Groundwater Depletion and Recovery: A Case Study at Shitangwan, Wuxi, Jiangsu, China
Previous Article in Journal
Spatio-Temporal Graph Neural Networks for Streamflow Prediction in the Upper Colorado Basin
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inferring Water Quality in the Songhua River Basin Using Random Forest Regression Based on Satellite Imagery and Geoinformation

1
College of Geography and Ocean Sciences, Yanbian University, Yanji 133000, China
2
College of Geographical Science, Harbin Normal University, Harbin 150025, China
*
Author to whom correspondence should be addressed.
Hydrology 2025, 12(3), 61; https://doi.org/10.3390/hydrology12030061
Submission received: 31 January 2025 / Revised: 4 March 2025 / Accepted: 13 March 2025 / Published: 17 March 2025

Abstract

Maintaining high water quality is essential not only for human survival but also for social and ecological safety. In recent years, due to the influence of human activities and natural factors, water quality has significantly deteriorated, and effective water quality monitoring is urgently needed. Traditional water quality monitoring requires substantial financial investment, whereas the remote sensing and random forest model not only reduces operational costs but also achieves a paradigm shift from discrete sampling points to spatially continuous surveillance. The random forest model was adopted to establish a remote sensing inversion model of three water quality parameters (conductivity, total nitrogen (TN), and total phosphorus (TP)) during the growing period (May to September) from 2020 to 2022 in the Songhua River Basin (SRB), using Landsat 8 imagery and China’s national water quality monitoring section data. Model verification shows that the R2 of conductivity is 0.67, followed by that of TN at 0.52 and TP at 0.47. The results revealed that the downstream conductivity of SRB (212.72 μS/cm) was significantly higher than that upstream (161.62 μS/cm), with TN and TP concentrations exhibiting a similar increasing pattern. This study is significant for improving ecological conservation and human health in the SRB.

1. Introduction

Surface water plays a key role in human survival and development, and adequate high-quality water resources are essential for both economic development and ecological health. The quality of water bodies is influenced by both natural conditions (e.g., topography, heavy rainfall, and soil erosion) and human activities (e.g., urbanization, industrial yield, and agricultural activities). In recent years, freshwater resources worldwide are currently facing the deterioration of aquatic environment quality, which significantly affects human health and life and restricts sustainable socioeconomic development [1,2,3]. Declining water quality is a problem that urgently needs attention in the current century and poses a serious threat to people’s quality of life and health [4,5,6,7]. Research indicates that nearly 80% of the global population is currently facing threats from water security issues [8]. Real-time and comprehensive monitoring of water quality has, therefore, become urgent.
In the past, water quality parameters were obtained only through on-site monitoring and analyses. However, this method is time consuming and labor intensive, requires large investments of time and money, causes human interference in ecologically fragile areas, and makes it difficult to meet the requirements of continuous spatial distribution monitoring at the watershed scale [9,10,11]. Therefore, water quality studies have gradually turned to other monitoring methods. With the advantages of low cost, wide coverage, real-time monitoring, and dynamic analysis, remote sensing technology has shown great potential and efficiency in monitoring water quality at different spatial and temporal scales [12,13,14]. In recent years, low- and medium-resolution multispectral sensors, such as GOCI, MERIS, and MODIS, have been widely used for water quality assessment [15,16,17,18,19]. Extensive research on oceans and delta regions indicates that remote sensing satellites are highly effective for water quality inversion [20,21,22,23]. However, due to the spatial resolution limitations of these sensors (≥250 m), they have difficulty meeting the higher requirements for water quality inversion. Consequently, these sensors are unsuitable for monitoring narrow inland rivers. In contrast, Landsat sensors perform well in application to water quality inversion [24,25,26]. Landsat datasets are publicly available and free, offering rich spectral information, a stable observation cycle, and a suitable spatial resolution, making them suitable for monitoring water quality parameters over large inland water bodies. Currently, water quality inversion studies of inland water bodies based on remotely sensed data mainly focus on large lakes or important urban river segments. There are relatively few studies on the spatially continuous remote sensing inversion of entire rivers, which requires further in-depth research [27,28]. To address this gap, machine learning has emerged as a powerful tool for inverting the spatial variations in water quality parameters, particularly in processing multi-dimensional data and establishing complex nonlinear relationships between spectral characteristics and hydrological parameters. Using training models, machine learning can efficiently invert water quality parameters from remote sensing data, thereby improving the efficiency and accuracy of inversion models [29].
The issue of declining water quality is particularly severe in developing countries [8]. China, the largest developing nation in the world, has experienced tremendous economic growth coupled with a decline in water quality from grain-yield activities, accelerated urbanization, and other human activities [30,31]. Northeastern China’s Songhua River Basin (SRB) is vital for preserving the health of regional ecosystems and guaranteeing food supply security. The Songnen Plain, through which the river flows, is one of the largest commercial grain supply bases in China, and the region contributes 18% of China’s total grain production, with the main crops being rice, corn, and soybeans [32]. However, because of the large scale of agricultural production activities in the region, humans have increased the use of pesticides and fertilizers in pursuit of food production, and residual organic matter, such as N and P, enters the river as a result of surface runoff, leading to water quality deterioration [33]. In addition, with the expansion of cities and increasing human activities, large amounts of domestic sewage and industrial wastewater are discharged into rivers; this, coupled with the impacts of global climate change, puts river ecosystems under increasing pressure. Data show that, in 2022, the water quality of the SRB ranked the lowest nationwide, with only 70.5% of its water bodies classified as Class I-III (Ministry of Ecology and Environment of the People’s Republic of China, 2022), indicating that the river water quality of the region is a cause for concern. However, current water quality monitoring studies in the SRB are insufficient. Considering that the SRB serves as a crucial commercial grain supply base in China, establishing a long-term scientific water quality inversion model is essential. Such a model would not only bolster ecological and environmental studies within the basin but also provide pivotal support for the high-quality and sustainable development of the SRB. This initiative is vital for ensuring that the SRB can continue to fulfill its role in food security while maintaining the health and stability of its ecosystems. Furthermore, improved water quality monitoring and management can contribute to the overall economic growth and environmental resilience of the region.
Based on the above context, this study focuses on the Songhua River Basin (SRB) as the research object. Considering its dual attributes as a critical commercial grain base and ecological barrier in China, coupled with intensive agricultural nonpoint source pollution, rapid urbanization, and transboundary ecological security demands within the basin, the following research objectives are proposed: (1) establish a multi-parameter collaborative inversion model by integrating Landsat-8 multispectral data, ground-measured water quality parameters (conductivity, total nitrogen (TN), total phosphorus (TP)), and meteorological–hydrological auxiliary data through machine learning algorithms (e.g., random forest, LightGBM and AdaBoost); (2) based on the inversion results, generate continuous spatial distribution maps of water quality parameters (TN, TP, conductivity) along the mainstream of the SRB during the growing seasons of 2020–2022; (3) additionally, analyze the impact of natural and anthropogenic factors on river water quality from an urban unit perspective, providing a solid foundation for further research on water quality safety in the SRB.

2. Materials and Methods

2.1. Study Area

Situated in northeastern China, the SRB is one of the country’s seven principal river basins, with a total area of 556,000 km2 and a geographic coordinate range of 119°52′–132°31′ E and 41°42′–51°38′ N (Figure 1). The annual runoff of the SRB is 76.2 billion m2 [32], and the population of the region is approximately 53.8 million, with major cities having larger populations including Changchun, Harbin, and Jilin [34]. The area has a mild continental monsoon climate, with May through September serving as the crop-growing season; an average annual temperature range of −4.89–7.30 °C between 1991 and 2020; and an annual precipitation in the range of 341.03–1031.16 mm. The source of the Songhua River is divided into the west-flowing Songhua River (WSR) as its southern source originating at Tianchi Lake in the Changbai Mountain, and the Nenjiang River as the northern source originating in the Yilhuli Mountains of the Daxing’anling Prefecture. The two rivers merge in Songyuan City, Jilin Province, forming the east-flowing Songhua River (ESR), which eventually joins the Heilongjiang River.
The main arteries (including the WSR, Nenjiang River, and ESR) flow through China’s Inner Mongolia Autonomous Region, the Jilin Province, and the Heilongjiang Province, including a total of 16 prefectural-level municipalities (leagues). The SRB’s elevation ranges from 2676 to −6 m, with an average elevation of 387 m above sea level. It is bounded by the Changbai Mountains in the south; the large and small Xing’an Leagues in the northwest and northeast, respectively; and the Sanjiang Plain in the east, with the Songnen Plain lying at the center. Its vast plains and black soil rich in organic matter make this region the most important commercial grain base in China. However, because of human influences such as agricultural activities, the water quality and ecology of the SRB have been of great concern, and the monitoring of river water quality has become increasingly important. Selecting the SRB as a study area not only allows us to carry out large-scale inversion analyses of water quality parameters in inland rivers but also provides scientific references for water quality inversion studies in different regions.

2.2. Data Collection

Permanent water body and remotely sensed data were obtained from the Google Earth Engine (GEE). Permanent water body data for the main stem of the SRB in 2020 were obtained from the JRC Yearly Water Classification History dataset v1.4 at a resolution of 30 m × 30 m [35]. Remote sensing data were obtained from USGS Landsat 8 Level 2, Collection 2, Tier 1 (30 m × 30 m) for the 2020–2022 growing season (May–September).
The measurement data (including conductivity, TN, and TP) for constructing the water quality inversion model were obtained from the National Real-Time Data Distribution System for Automatic Surface Water Quality Monitoring of the China Environmental Monitoring General Station and were updated every four hours. Measurement data from the mainstream of the Songhua River from May to September from 2020 to 2022 were selected for this study.
Socioeconomic data (including population, grain yield, cultivated land area, and cropland area in proportion to urban area) from 2020 to 2022 were obtained from the Heilongjiang Provincial Bureau of Statistics, Jilin Provincial Bureau of Statistics, and Xing’an League Municipal Bureau of Statistics and Hulun Buir City of the Inner Mongolia Autonomous Region.
Digital elevation model (DEM) data with a spatial resolution of 30 m × 30 m were obtained from the General Bathymetric Chart of the Oceans, which was provided by the International Hydrological Organization (IHO). The data were produced in an exploration study co-sponsored by the IHO and the Intergovernmental Oceanographic Commission (IOC). Land use data for the SRB are from The 30 m annual land cover datasets and its dynamics in China from 1990 to 2021 (Figure S1). Slope data were calculated from the DEM using ArcGIS 10.8 software. The National Tibetan Plateau Data Center in China provided annual mean temperature and precipitation data for 2020–2022 at a resolution of 1 km × 1 km (Figure S2).

2.3. Methods

2.3.1. Construction of Inversion Band Combinations

Data from 34 national water quality control cross sections along the main trunk and tributaries of the SRB were paired with Landsat 8 imagery from the GEE. To improve the accuracy of the inversion model, the time gap between the water quality measurements and satellite images was maintained within 2 h. A total of 290 datasets were obtained after matching, of which 80% were used for model construction and 20% for model accuracy verification. This study referred to previous research findings to fully utilize the information contained in remote sensing data [36,37]. By performing various mathematical calculations (square, sum, subtraction, and hybrid computation) on remote sensing bands and normalizing water body indices, 14 indicators were obtained that constituted the spectral features of the SRB water quality parameter inversion model (Table 1).

2.3.2. Pearson Correlation Analysis

The Pearson’s correlation coefficient is a statistical index used to measure the strength and direction of the linear relationship between two variables. The Pearson correlation coefficient (r) was used to analyze the correlation between human activities (cropland area, grain yield, and cropland area as a proportion of urban area); natural factors (DEM, precipitation, temperature, and slope); and conductivity, TN, and TP. Geospatial datasets demonstrating statistically significant correlations (p < 0.05) were selectively integrated into the inversion model through rigorous hypothesis testing.
r = i = 1 n X i X ¯ Y i Y ¯ i = 1 n X i X ¯ 2 i = 1 n Y i Y ¯ 2 ,
where X ( x 1 , x 2 , , x n ) and Y ( y 1 , y 2 , , y n ) are two consecutive sequences of correlated variables; r is the correlation coefficient, with a value of r between −1 and 0 indicating a negative correlation, and 0 and 1 indicating a positive correlation. The closer r is to 0, the weaker the correlation between the two variables.

2.3.3. Model Construction

Machine learning, which is able to effectively search and describe complex quantitative relationships, was employed for remote sensing inversion. One of the advantages of machine learning is its ability to deal with complex linear and nonlinear relationships, adapt to high-dimensional and big data, automatically identify and select important features, and adjust and improve models based on new data. In addition, machine learning usually exhibits a stronger predictive performance on complex problems, making it more suitable for dealing with various tasks and data types. The random forest, LightGBM, and AdaBoost models have been widely used in remote sensing inversion of water quality and have shown good model results [38,39,40]. We employed the random forest, AdaBoost, and LightGBM algorithms to model and analyze the data. Subsequently, the machine learning model with the best performance was selected for water quality inversion, based on model evaluation. The specific research idea is shown in Figure 2.

2.3.4. Model Evaluation

Four metrics were used to validate and evaluate the accuracy of the inversion model: coefficient of determination (R2), mean absolute error (MAE), mean squared error (MSE), and root mean square error (RMSE). R2 is a number between 0 and 1 that indicates how much of the variance in the expected data the model accounts for. Higher model accuracy is indicated by smaller MAE, MSE, and RMSE values.
R 2 = 1 i = 1 n y i y ^ i 2 i = 1 n y i y ¯ 2 ,
M A E = 1 n i = 1 n y i y ^ i ,
M S E = 1 n i = 1 n y i y ^ i 2 ,
R M S E = 1 n i = 1 n y i y ^ i 2 ,
where y i , y ^ i , and y ¯ are the actual, predicted, and average of actual values, respectively.

3. Results

3.1. Pearson Correlation Results

The Pearson correlation coefficient is calculated to evaluate the correlation between geospatial data and conductivity, TN, and TP. First, the significant correlations (p < 0.05) between the conductivity and the geographical spatial data are DEM, precipitation, grain yield, proportion, slope, and temperature (Table 2).
Next, the significant correlations (p < 0.05) between TN and the geographical spatial data are DEM, grain yield, slope, and temperature (Table 3). The significant correlations (p < 0.05) between TP and the geographical spatial data are DEM, precipitation, and cropland area (Table 4). When studying other areas, Tan et al. and Zhang et al. also found that temperature and precipitation have significant effects on TN and TP concentrations [41,42].
These data had a significant impact on conductivity, TN, and TP, and the highly significant correlations with the geographical spatial data were incorporated into the machine learning model.

3.2. Evaluation of Different Models

In this study, the machine learning modeling method was adopted, with remote sensing data and geographic data (p < 0.05) as the input data and watershed water quality indicators (conductivity, TN, and TP) as the output data. Three kinds of machine learning models, namely random forest, LightGBM, and AdaBoost, were constructed, and the parameters were optimized by the nested cross-validation strategy. After using R2, MAE, MSE, and RMSE to evaluate the accuracy, it was found that the random forest model was superior to the LightGBM and AdaBoost models (Table 5). Although the evaluation results of the LightGBM model were better than those of the random forest on the training TN and TP data, the evaluation results on TN and TP verification data were worse than in the random forest model. For the random forest model, the R2 for conductivity, TN, and TP training data reached 0.95, 0.76, and 0.73, respectively, while the those of the verification data were 0.67, 0.52, and 0.47, respectively. Although the R2 values of TN and TP were lower than those of conductivity, the models performed well on the evaluation indexes of MAE, MSE, and RMSE (Figure 3).

3.3. Spatial Variation in Water Quality Parameters

A random forest regression model was applied for remote sensing inversion of the data, and a spatial distribution map of the concentrations of conductivity, TN, and TP in the main SRB stream from 2020 to 2022 was successfully generated. ArcGIS 10.8 software was used to carry out partition statistics on the inversion results according to municipal administrative units. The three municipalities with the lowest measured river water conductivity, ranging from 76.18 to 104.41 μS/cm, were Baishan City and the Yanbian and Daxing’anling prefectures (Figure 4), which constitute the source area of the Songhua River. Secondly, the river water conductivity measurements of Heihe City were between 104.42 and 132.64 μS/cm; those of Hulun Buir and Jilin cities were in the range of 132.65–160.87 μS/cm; those of the river water downstream from Qiqihar and Changchun cities were in the range of 160.88–189.11 μS/cm; those of Xing’an League and Daqing, Baicheng, Songyuan, Suihua, Harbin, Hegang, and Jiamusi cities were the highest, ranging from 189.12 to 217.34 μS/cm.
The TN values in Baishan City and Daxing’anling Prefecture were the lowest, ranging from 0.47 to 0.96 mg/L (Figure 5). The river water TN measurements in Yanbian Prefecture and Heihe City were in the range of 0.97–1.52 mg/L; those in Jilin and Hulun Buir cities were between 1.53 and 1.77 mg/L; those in TN in Qiqihar, Xing’an League, Daqing, Baicheng, Songyuan, Changchun, Hegang, and Jiamusi were relatively high, ranging from 1.78 to 1.92 mg/L. The highest river water TN values were observed in Suihua and Harbin cities, ranging from 1.93 to 2.27 mg/L.
The spatial variation in river water TP showed that Baishan, Jilin, Changchun, Daxing’anling, Heihe, Hulun Buir, and Qiqihar cities had the lowest values, ranging from 0.10 to 0.11 mg/L (Figure 6); those in Yanbian Prefecture, Songyuan City, Xing’an League, and Suihua City were in the range of 0.12–0.13 mg/L; and those in Daqing, Baicheng, and Harbin were between 0.14 and 0.15 mg/L. Hegang and Jiamusi, which are in the downstream area of the SRB, had the highest river water TP measurements, ranging from 0.18 to 0.19 mg/L.
Subsequently, the statistical data of the WSR, the northern source of Songhua River–Nenjiang River, and the ESR in the lower reaches of the Songhua River were compiled. The results (Table 6) show that the conductivity values ranged from 161.62 µS/cm in the WSR to 212.72 µS/cm in the ESR. The total nitrogen concentrations were highest in the ESR (2.07 mg/L), followed by the Nenjiang River (1.71 mg/L) and the WSR. Similarly, the TP concentrations were highest in the ESR (0.14 mg/L) and lowest in the WSR (0.10 mg/L). The conductivity, TN, and TP of the ESR were lower than those for the Nenjiang River, indicating that the water quality of the southern source of the Songhua River was better than that of the northern source. The downstream section of the ESR had the lowest water quality.

4. Discussion

In our study of the SRB water quality, conductivity, TN, and TP were the three key water indicators. These indicators not only reflect the health of the water body but also reveal their relationship with the ecological environment of the basin. The overall results show that the water quality of the WSR is higher than that of the Nenjiang River, whereas the water quality of the ESR is the lowest. This may be due to the fact that the cultivated land area and grain output along the ESR are lower than that of Nenjiang River, so it is less affected by social and economic activities. In particular, the cultivated land area of the Nenjiang River is 3.83 times that of the WSR (Figure S3), and the land use intensity is high, leading to a decline in water quality. Moreover, the Songhua River, which flows eastward, is located in the downstream part of the basin, where pollutants easily accumulate. Elements such as nitrogen and phosphorus used in agricultural activities may enter rivers through surface runoff after rainfall, which may lead to higher concentrations of TN and TP in the lower reaches of rivers than in upstream areas. Baishan City and the Daxing’anling region are located at the source of the river and have the best water quality.
Although Yanbian Prefecture is also one of the sources of the river system, its TP content was higher than that of Jilin City downstream. This may be related to the rapid development of tourism in this city. Over the past two decades, the number of domestic tourists and the tourism revenue in Yanbian have increased by approximately 12 and 54 times, respectively. The bilingual-themed “bullet screen wall,” the uniquely captivating folk customs of the Korean ethnic group, the picturesque villages, and the pleasant climate are among the key reasons for the explosive growth of tourism in Yanbian Prefecture. In 2023, the number of tourists received reached 26.464 million, resulting in a tourism revenue of USD 6.169 billion, and the number of tourists was 14 times that of the permanent population in Yanbian. Tourism demand for water resources is usually highly concentrated in space and time, which may put significant pressure for water on areas with a high tourism revenue [43]. Even though Jiamusi and Hegang are located in the lower reaches of Harbin, the TN contents in the rivers of these two cities were lower than that in Harbin. A tributary (Tangwang River) joins the Songhua River near the junction of Harbin and Jiamusi City, which may be the cause for the dilution of TN content after it joins the main stream, resulting in a lower TN content downstream than upstream.
There are some inevitable problems in remote sensing inversion research on the Songhua River water quality, in particular, the measurement data are easily affected by temperature and precipitation [44]. The 30 m × 30 m spatial resolution of Landsat 8 data used in this study is relatively low, especially in narrow river areas, which are prone to inversion errors. In addition, model accuracy can still be improved. This may be because the mobility of the river causes an increased time difference between the satellite imaging time and the site data acquisition itself. Indeed, in other studies, it was found that the R2 was around 0.5 in dynamic river remote sensing inversion, but the accuracy is relatively high in static lakes. Zhou et al. [45] modeled the water quality inversion model of the urban river network with an R2 of 0.47 for TP, which is consistent with the results of this study. Meanwhile, Peng et al. [46] demonstrated higher predictive accuracy in machine learning-based water quality monitoring for Poyang Lake, with R2 values of 0.92 for TN and 0.88 for TP, further indicating that river fluidity may contribute to reduced model accuracy.
Since the China National Environmental Monitoring Center officially commissioned new water quality monitoring stations in May 2020, we only acquired water quality data for the growing seasons (May–September) from 2020 to 2022. This may also lead to accuracy problems because of the small number of data available. We will conduct further supplementation in subsequent research by incorporating long-term data analysis to enhance the general acceptability of the resulting models.
Yet, despite the shortcomings of this study, it is of great scientific significance for studying the remote sensing inversion of SRB water quality parameters.

5. Conclusions

In summary, based on Landsat-8, data from the China Environmental Monitoring General Station, and geographic data, three machine learning algorithms (random forest regression, LightGBM, and AdaBoost) were developed to model conductivity, TN, and TP, respectively. Their inversion performance was systematically compared. The key findings of this study are as follows:
Among the three machine learning models evaluated, the random forest model performed the best and has great potential in the water quality inversion of the SRB, which can continue to be used in water quality research in this area.
(1)
The conductivity results show that the closer to the source, the better the water quality. And the TN content of Yanbian Prefecture was higher than that of downstream cities, which may be influenced by tributaries caused by vigorous tourism in recent years.
(2)
The overall results show that the water quality in the upper reaches of the Songhua River is better than that in the lower reaches and that the water quality of the west Songhua River and Nenjiang River is much higher than that of the east Songhua River.
This study is of great significance in evaluating water pollution using spatial remote sensing and random forest regression techniques. It provides scientific evidence to improve people’s awareness of the SRB environmental risks and provides a scientific basis for watershed planning and water quality management.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/hydrology12030061/s1, Figure S1: Land use in the SRB; Figure S2: Precipitation map of the Songhua River Basin; Figure S3: Cultivated land map of the SRB.

Author Contributions

Conceptualization, Z.Y. and H.Y.; Methodology, H.Y.; formal analysis, Z.Y., H.Y. and L.L.; writing—original draft, Z.Y.; writing—review and editing, H.Y. and L.L.; validation, H.Y. and X.G.; supervision, J.Y. (Jiangtao Yu) and J.Y. (Jie Yu). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 42461017), the Natural Science Foundation of Jilin Province (Grant No. YDZJ202201ZYTS478), and the Natural Science Foundation of Jilin Province (Grant No. YDZJ202501ZYTS551).

Data Availability Statement

Research data from this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
TNTotal nitrogen
TPTotal phosphorus
SRBSonghua River Basin
USGSUnited States Geological Survey
WSRWest-flowing Songhua River
ESR
GEE
East-flowing Songhua River
Google Earth Engine
IHOInternational Hydrological Organization
IOCOceanographic Commission
DEMDigital elevation model
R2Coefficient of determination
MAEMean absolute error
MSEMean squared error
RMSERoot mean square error

References

  1. Zhang, Z.; Chen, X.; Xu, C.-Y.; Hong, Y.; Hardy, J.; Sun, Z. Examining the influence of river–lake interaction on the drought and water resources in the Poyang Lake basin. J. Hydrol. 2015, 522, 510–521. [Google Scholar] [CrossRef]
  2. Baggio, G.; Qadir, M.; Smakhtin, V. Freshwater availability status across countries for human and ecosystem needs. Sci. Total Environ. 2021, 792, 148230. [Google Scholar] [CrossRef]
  3. Wang, H.; Zhang, J.; Zeng, W. Intelligent simulation of aquatic environment economic policy coupled ABM and SD models. Sci. Total Environ. 2018, 618, 1160–1172. [Google Scholar] [CrossRef]
  4. Schwarzenbach, R.P.; Egli, T.; Hofstetter, T.B.; von Gunten, U.; Wehrli, B. Global Water Pollution and Human Health. Annu. Rev. Environ. Resour. 2010, 35, 109–136. [Google Scholar] [CrossRef]
  5. Githaiga, K.B.; Njuguna, S.M.; Gituru, R.W.; Yan, X. Water quality assessment, multivariate analysis and human health risks of heavy metals in eight major lakes in Kenya. J. Environ. Manag. 2021, 297, 113410. [Google Scholar] [CrossRef] [PubMed]
  6. Shou, C.-Y.; Yue, F.-J.; Zhou, B.; Fu, X.; Ma, Z.-N.; Gong, Y.-Q.; Chen, S.-N. Chronic increasing nitrogen and endogenous phosphorus release from sediment threaten to the water quality in a semi-humid region reservoir. Sci. Total Environ. 2024, 931, 172924. [Google Scholar] [CrossRef]
  7. Saeedi, R.; Sadeghi, S.; Massoudinejad, M.; Oroskhan, M.; Mohagheghian, A.; Mohebbi, M.; Abtahi, M. Assessing drinking water quality based on water quality indices, human health risk, and burden of disease attributable to heavy metals in rural communities of Yazd County, Iran, 2015–2021. Heliyon 2024, 10, e33984. [Google Scholar] [CrossRef]
  8. Vörösmarty, C.J.; McIntyre, P.B.; Gessner, M.O.; Dudgeon, D.; Prusevich, A.; Green, P.; Glidden, S.; Bunn, S.E.; Sullivan, C.A.; Liermann, C.R.; et al. Global threats to human water security and river biodiversity. Nature 2010, 467, 555–561. [Google Scholar] [CrossRef]
  9. Cao, X.; Zhang, J.; Meng, H.; Lai, Y.; Xu, M. Remote sensing inversion of water quality parameters in the Yellow River Delta. Ecol. Indic. 2023, 155, 110914. [Google Scholar] [CrossRef]
  10. Wang, F.; Wang, Y.; Chen, Y.; Liu, K. Remote sensing approach for the estimation of particulate organic carbon in coastal waters based on suspended particulate concentration and particle median size. Mar. Pollut. Bull. 2020, 158, 111382. [Google Scholar] [CrossRef]
  11. Harkort, L.; Duan, Z. Estimation of dissolved organic carbon from inland waters at a large scale using satellite data and machine learning methods. Water Res. 2023, 229, 119478. [Google Scholar] [CrossRef]
  12. Chen, Y.; Arnold, W.A.; Griffin, C.G.; Olmanson, L.G.; Brezonik, P.L.; Hozalski, R.M. Assessment of the chlorine demand and disinfection byproduct formation potential of surface waters via satellite remote sensing. Water Res. 2019, 165, 115001. [Google Scholar] [CrossRef]
  13. Guo, K.; Zou, T.; Jiang, D.; Tang, C.; Zhang, H. Variability of Yellow River turbid plume detected with satellite remote sensing during water-sediment regulation. Cont. Shelf Res. 2017, 135, 74–85. [Google Scholar] [CrossRef]
  14. Ahmed, W.; Mohammed, S.; El-Shazly, A.; Morsy, S. Tigris River water surface quality monitoring using remote sensing data and GIS techniques. Egypt. J. Remote Sens. Space Sci. 2023, 26, 816–825. [Google Scholar] [CrossRef]
  15. Moradi, M. Comparison of the efficacy of MODIS and MERIS data for detecting cyanobacterial blooms in the southern Caspian Sea. Mar. Pollut. Bull. 2014, 87, 311–322. [Google Scholar] [CrossRef] [PubMed]
  16. Qing, S.; Zhang, J.; Cui, T.; Bao, Y. Retrieval of sea surface salinity with MERIS and MODIS data in the Bohai Sea. Remote Sens. Environ. 2013, 136, 117–125. [Google Scholar] [CrossRef]
  17. Feng, L.; Hou, X.; Zheng, Y. Monitoring and understanding the water transparency changes of fifty large lakes on the Yangtze Plain based on long-term MODIS observations. Remote Sens. Environ. 2019, 221, 675–686. [Google Scholar] [CrossRef]
  18. Shi, K.; Zhang, Y.; Zhang, Y.; Qin, B.; Zhu, G. Understanding the long-term trend of particulate phosphorus in a cyanobacteria-dominated lake using MODIS-Aqua observations. Sci. Total Environ. 2020, 737, 139736. [Google Scholar] [CrossRef]
  19. Zhou, Y.; Yu, D.; Cheng, W.; Gai, Y.; Yao, H.; Yang, L.; Pan, S. Monitoring multi-temporal and spatial variations of water transparency in the Jiaozhou Bay using GOCI data. Mar. Pollut. Bull. 2022, 180, 113815. [Google Scholar] [CrossRef]
  20. Doxaran, D.; Lamquin, N.; Park, Y.-J.; Mazeran, C.; Ryu, J.-H.; Wang, M.; Poteau, A. Retrieval of the seawater reflectance for suspended solids monitoring in the East China Sea using MODIS, MERIS and GOCI satellite data. Remote Sens. Environ. 2014, 146, 36–48. [Google Scholar] [CrossRef]
  21. Caballero, I.; Navarro, G. Application of extended full resolution MERIS imagery to assist coastal management of the area adjacent to the Guadalquivir estuary. Prog. Oceanogr. 2018, 165, 215–232. [Google Scholar] [CrossRef]
  22. Tao, B.; Mao, Z.; Lei, H.; Pan, D.; Shen, Y.; Bai, Y.; Zhu, Q.; Li, Z. A novel method for discriminating Prorocentrum donghaiense from diatom blooms in the East China Sea using MODIS measurements. Remote Sens. Environ. 2015, 158, 267–280. [Google Scholar] [CrossRef]
  23. Bernardo, N.; Watanabe, F.; Rodrigues, T.; Alcântara, E. Evaluation of the suitability of MODIS, OLCI and OLI for mapping the distribution of total suspended matter in the Barra Bonita Reservoir (Tietê River, Brazil). Remote Sens. Appl. 2016, 4, 68–82. [Google Scholar] [CrossRef]
  24. Montanher, O.C.; Novo, E.M.L.M.; Barbosa, C.C.F.; Rennó, C.D.; Silva, T.S.F. Empirical models for estimating the suspended sediment concentration in Amazonian white water rivers using Landsat 5/TM. Int. J. Appl. Earth Obs. Geoinf. 2014, 29, 67–77. [Google Scholar] [CrossRef]
  25. Griffin, C.G.; McClelland, J.W.; Frey, K.E.; Fiske, G.; Holmes, R.M. Quantifying CDOM and DOC in major Arctic rivers during ice-free conditions using Landsat TM and ETM+ data. Remote Sens. Environ. 2018, 209, 395–409. [Google Scholar] [CrossRef]
  26. Du, Y.; Song, K.; Liu, G.; Wen, Z.; Fang, C.; Shang, Y.; Zhao, F.; Wang, Q.; Du, J.; Zhang, B. Quantifying total suspended matter (TSM) in waters using Landsat images during 1984–2018 across the Songnen Plain, Northeast China. J. Environ. Manag. 2020, 262, 110334. [Google Scholar] [CrossRef]
  27. Xia, K.; Wu, T.; Li, X.; Wang, S.; Shen, Q. A new method for accurate inversion of Forel-Ule index using MODIS images—revealing the water color evolution in China’s large lakes and reservoirs over the past two decades. Water Res. 2024, 255, 121560. [Google Scholar] [CrossRef]
  28. Sahoo, D.P.; Sahoo, B.; Tiwari, M.K. MODIS-Landsat fusion-based single-band algorithms for TSS and turbidity estimation in an urban-waste-dominated river reach. Water Res. 2022, 224, 119082. [Google Scholar] [CrossRef]
  29. Zhang, S.; Wang, L.; Wang, Y.; Zhang, X.; Zhu, Y.; Ma, G. Monitoring of Low Chl-a Concentration in Hulun Lake Based on Fusion of Remote Sensing Satellite and Ground Observation Data. Remote Sens. 2024, 16, 1811. [Google Scholar] [CrossRef]
  30. Sajeev, S.; Sekar, S.; Kumar, B.; Senapathi, V.; Chung, S.Y.; Gopalakrishnan, G. Variations of water quality deterioration based on GIS techniques in surface and groundwater resources in and around Vembanad Lake, Kerala, India. Geochemistry 2020, 80 (Suppl. S4), 125626. [Google Scholar] [CrossRef]
  31. Zhang, W.; Rong, N.; Jin, X.; Meng, X.; Han, S.; Zhang, D.; Shan, B. Dissolved oxygen variation in the North China Plain river network region over 2011–2020 and the influencing factors. Chemosphere 2022, 287, 132354. [Google Scholar] [CrossRef] [PubMed]
  32. Feng, Y.; Guo, Y.; Shen, Y.; Zhang, G.; Wang, Y.; Chen, X. Change of crop structure intensified water supply-demand imbalance in China’s Black Soil Granary. Agric. Water Manag. 2024, 306, 109199. [Google Scholar] [CrossRef]
  33. Shen, L.Q.; Amatulli, G.; Sethi, T.; Raymond, P.; Domisch, S. Estimating nitrogen and phosphorus concentrations in streams and rivers, within a machine learning framework. Sci. Data. 2020, 7, 161. [Google Scholar] [CrossRef]
  34. Wang, S.; Wang, Y.; Ran, L.; Su, T. Climatic and anthropogenic impacts on runoff changes in the Songhua River basin over the last 56years (1955–2010), Northeastern China. Catena 2015, 127, 258–269. [Google Scholar] [CrossRef]
  35. Pekel, J.-F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes. Nature 2016, 540, 418–422. [Google Scholar] [CrossRef]
  36. Maimouni, S.; Moufkari, A.A.; Daghor, L.; Fekri, A.; Oubraim, S.; Lhissou, R. Spatiotemporal monitoring of low water turbidity in Moroccan coastal lagoon using Sentinel-2 data. Remote Sens. Appl. Soc. Environ. 2022, 26, 100772. [Google Scholar] [CrossRef]
  37. Yin, F.; Yang, G.; Yan, M.; Xie, Q. Application of multispectral remote sensing technology in water quality monitoring. Desal. Water Treat. 2019, 149, 363–369. [Google Scholar] [CrossRef]
  38. Yousefi, M.; Oskoei, V.; Esmaeli, H.R.; Baziar, M. An innovative combination of extra trees within adaboost for accurate prediction of agricultural water quality indices. Results Eng. 2024, 24, 103534. [Google Scholar] [CrossRef]
  39. Li, B.; Liu, K.; Wang, M.; Wang, Y.; He, Q.; Zhuang, L.; Zhu, W. High-spatiotemporal-resolution dynamic water monitoring using LightGBM model and Sentinel-2 MSI data. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103278. [Google Scholar] [CrossRef]
  40. Wang, F.; Wang, Y.; Zhang, K.; Hu, M.; Weng, Q.; Zhang, H. Spatial heterogeneity modeling of water quality based on random forest regression and model interpretation. Environ. Res. 2021, 202, 111660. [Google Scholar] [CrossRef]
  41. Tan, Z.; Ren, J.; Li, S.; Li, W.; Zhang, R.; Sun, T. Inversion of Nutrient Concentrations Using Machine Learning and Influencing Factors in Minjiang River. Water 2023, 15, 1398. [Google Scholar] [CrossRef]
  42. Zhang, Y.; Jin, S.; Wang, N.; Zhao, J.; Guo, H.; Pellikka, P. Total Phosphorus and Nitrogen Dynamics and Influencing Factors in Dongting Lake Using Landsat Data. Remote Sens. 2022, 14, 5648. [Google Scholar] [CrossRef]
  43. Hadjikakou, M.; Miller, G.; Chenoweth, J.; Druckman, A.; Zoumides, C. A comprehensive framework for comparing water use intensity across different tourist types. J. Sustain. Tour. 2015, 23, 1445–1467. [Google Scholar] [CrossRef]
  44. Debels, P.; Figueroa, R.; Urrutia, R.; Barra, R.; Niell, X. Evaluation of Water Quality in the Chillán River (Central Chile) Using Physicochemical Parameters and a Modified Water Quality Index. Environ. Monit. Assess. 2005, 110, 301–322. [Google Scholar] [CrossRef] [PubMed]
  45. Zhou, X.; Liu, C.; Carrion, D.; Akbar, A.; Wang, H. Spectro-environmental factors integrated ensemble learning for urban river network water quality remote sensing. Water Res. 2024, 267, 122544. [Google Scholar] [CrossRef]
  46. Peng, C.; Xie, Z.; Jin, X. Using Ensemble Learning for Remote Sensing Inversion of Water Quality Parameters in Poyang Lake. Sustainability 2024, 16, 3355. [Google Scholar] [CrossRef]
Figure 1. Location map of the study area.
Figure 1. Location map of the study area.
Hydrology 12 00061 g001
Figure 2. Workflow chart of study. Proportion means cropland area as a proportion of urban area.
Figure 2. Workflow chart of study. Proportion means cropland area as a proportion of urban area.
Hydrology 12 00061 g002
Figure 3. Random forest regression model training and test results. (a) Conductivity; (b) TN; (c) TP.
Figure 3. Random forest regression model training and test results. (a) Conductivity; (b) TN; (c) TP.
Hydrology 12 00061 g003
Figure 4. Spatial variation map of conductivity in the Songhua River.
Figure 4. Spatial variation map of conductivity in the Songhua River.
Hydrology 12 00061 g004
Figure 5. Spatial variation map of TN in the Songhua River.
Figure 5. Spatial variation map of TN in the Songhua River.
Hydrology 12 00061 g005
Figure 6. Spatial variation map of TP in the Songhua River.
Figure 6. Spatial variation map of TP in the Songhua River.
Hydrology 12 00061 g006
Table 1. Selection of remote sensing band combinations.
Table 1. Selection of remote sensing band combinations.
Feature ClassificationBand CombinationNumber
Single band B 2 ,   B 3 ,   B 4 ,   B 5 ,   B 6 ,   B 7 6
Band square index B 4 2 ,   B 7 2 2
Band sum index B 3 + B 4 1
Band subtraction index B 2 B 3 ,   B 2 B 4 2
Band mixture index B 2 + B 3 / B 4 1
Modified Normalized Difference Water Index (MNDWI) B 3 B 5 / B 3 + B 5 1
Normalized Difference Vegetation Index (NDVI) B 5 B 4 / B 5 + B 4 1
Table 2. The correlation between conductivity and geographical spatial data.
Table 2. The correlation between conductivity and geographical spatial data.
DEMPrecipitationGrain
Yield
Cropland AreaProportion 1SlopeTemperature
DEMPearson Correlation--
Significance--
PrecipitationPearson Correlation0.444 **--
Significance0.000 --
Grain yieldPearson Correlation−0.365 **0.040 --
Significance0.000 0.495 --
Cropland areaPearson Correlation−0.127 *−0.068 0.569 **--
Significance0.031 0.247 0.000 --
Proportion 1Pearson Correlation−0.084 0.288 **0.674 **0.273 **--
Significance0.155 0.000 0.000 0.000 --
SlopePearson Correlation0.461 **−0.228 **−0.492 **−0.138 *−0.500 **--
Significance0.000 0.000 0.000 0.019 0.000 --
TemperaturePearson Correlation−0.002 0.290 **−0.130 *−0.339 **0.244 **−0.328 **--
Significance0.980 0.000 0.026 0.000 0.000 0.000 --
ConductivityPearson Correlation−0.492 **−0.171 **0.215 **−0.107 0.263 **−0.350 **0.186 **
Significance0.000 0.003 0.000 0.070 0.000 0.000 0.001
1 Proportion is cropland area as a proportion of urban area. ** The correlation is significant at the 0.01 level (two tailed). * The correlation is significant at the 0.05 level (two tailed).
Table 3. The correlation between TN and geographical spatial data.
Table 3. The correlation between TN and geographical spatial data.
DEMPrecipitationGrain
Yield
Cropland AreaProportion 1SlopeTemperature
DEMPearson Correlation--
Significance--
PrecipitationPearson Correlation0.374 **--
Significance0.000--
Grain yieldPearson Correlation−0.324 **0.078--
Significance0.0000.186--
Cropland areaPearson Correlation−0.012−0.0320.505 **--
Significance0.8400.5910.000--
Proportion 1Pearson Correlation−0.116 *0.324 **0.662 **0.221 **--
Significance0.0470.0000.0000.000--
SlopePearson Correlation0.519 **−0.288 **−0.426 **−0.023−0.512 **--
Significance0.0000.0000.0000.6940.000--
TemperaturePearson Correlation−0.217 **0.325 **0.001−0.285 **0.367 **−0.548 **--
Significance0.0000.0000.9830.0000.0000.000--
TNPearson Correlation−0.306 **−0.0150.326 **−0.0600.383 **−0.356 **0.296 **
Significance0.000 0.003 0.000 0.070 0.000 0.000 0.001
1 Proportion is cropland area as a proportion of urban area. ** The correlation is significant at the 0.01 level (two tailed). * The correlation is significant at the 0.05 level (two tailed).
Table 4. The correlation between TP and geographical spatial data.
Table 4. The correlation between TP and geographical spatial data.
DEMPrecipitationGrain
Yield
Cropland AreaProportion 1SlopeTemperature
DEMPearson Correlation--
Significance--
PrecipitationPearson Correlation0.461 **--
Significance0.000--
Grain yieldPearson Correlation−0.372 **0.011--
Significance0.0000.852--
Cropland areaPearson Correlation−0.097−0.0640.588 **--
Significance0.1000.2800.000--
Proportion 1Pearson Correlation−0.1050.254 **0.675 **0.325 **--
Significance0.0740.0000.0000.000--
SlopePearson Correlation0.495 **−0.172 **−0.448 **−0.107−0.476 **--
Significance0.0000.0030.0000.0690.000--
TemperaturePearson Correlation−0.0830.264 **−0.132 *−0.338 **0.218 **−0.395 **--
Significance0.1560.0000.0240.0000.0000.000--
TPPearson Correlation−0.463 **−0.264 **0.012−0.130 *−0.091−0.085−0.039
Significance0.0000.0000.8400.0270.1200.1510.504
1 Proportion is cropland area as a proportion of urban area. ** The correlation is significant at the 0.01 level (two tailed). * The correlation is significant at the 0.05 level (two tailed).
Table 5. Accuracy evaluation results of random forest, AdaBoost, and LightGBM models.
Table 5. Accuracy evaluation results of random forest, AdaBoost, and LightGBM models.
R2MAEMSERMSE
Random forestConductivityTraining0.959.39174.0613.19
Test0.6727.921314.7036.26
TNTraining0.760.270.130.36
Test0.520.390.280.53
TPTraining0.730.020.000.03
Test0.470.030.020.04
AdaBoostConductivityTraining0.7826.79939.2830.65
Test0.5430.731622.6540.28
TNTraining0.580.400.220.47
Test0.250.520.450.68
TPTraining0.630.030.000.04
Test0.290.040.000.06
LightGBMConductivityTraining0.9311.10243.7315.61
Test0.5928.411463.0338.25
TNTraining0.870.200.070.26
Test0.390.460.370.61
TPTraining0.890.010.000.02
Test0.370.040.000.05
Table 6. Average of tn, tp concentrations across the three rivers.
Table 6. Average of tn, tp concentrations across the three rivers.
Conductivity
(μS/cm)
TN
(mg/L)
TP
(mg/L)
West-flowing Songhua River161.621.620.10
Nenjiang River167.791.710.11
East-flowing Songhua River212.722.070.14
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, Z.; Yu, H.; Li, L.; Yu, J.; Yu, J.; Gao, X. Inferring Water Quality in the Songhua River Basin Using Random Forest Regression Based on Satellite Imagery and Geoinformation. Hydrology 2025, 12, 61. https://doi.org/10.3390/hydrology12030061

AMA Style

Yu Z, Yu H, Li L, Yu J, Yu J, Gao X. Inferring Water Quality in the Songhua River Basin Using Random Forest Regression Based on Satellite Imagery and Geoinformation. Hydrology. 2025; 12(3):61. https://doi.org/10.3390/hydrology12030061

Chicago/Turabian Style

Yu, Zhanqiang, Hangnan Yu, Lan Li, Jiangtao Yu, Jie Yu, and Xinyue Gao. 2025. "Inferring Water Quality in the Songhua River Basin Using Random Forest Regression Based on Satellite Imagery and Geoinformation" Hydrology 12, no. 3: 61. https://doi.org/10.3390/hydrology12030061

APA Style

Yu, Z., Yu, H., Li, L., Yu, J., Yu, J., & Gao, X. (2025). Inferring Water Quality in the Songhua River Basin Using Random Forest Regression Based on Satellite Imagery and Geoinformation. Hydrology, 12(3), 61. https://doi.org/10.3390/hydrology12030061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop