Next Article in Journal
Priority Pollutants in Water and Sediments of a River for Control Basing on Benthic Macroinvertebrate Community Structure
Previous Article in Journal
Impact of Vegetation Density on the Wake Structure
Open AccessArticle

Using Real-Time Data and Unsupervised Machine Learning Techniques to Study Large-Scale Spatio–Temporal Characteristics of Wastewater Discharges and their Influence on Surface Water Quality in the Yangtze River Basin

1
School of Environment, Tsinghua University, Beijing 100084, China
2
Green Nest Smart Data Technologies (Beijing) Co. Ltd., 77 Shuangqing Rd, Beijing 100084, China
*
Author to whom correspondence should be addressed.
Water 2019, 11(6), 1268; https://doi.org/10.3390/w11061268
Received: 16 May 2019 / Revised: 13 June 2019 / Accepted: 13 June 2019 / Published: 17 June 2019
(This article belongs to the Section Water Resources Management, Policy and Governance)
Most worldwide industrial wastewater, including in China, is still directly discharged to aquatic environments without adequate treatment. Because of a lack of data and few methods, the relationships between pollutants discharged in wastewater and those in surface water have not been fully revealed and unsupervised machine learning techniques, such as clustering algorithms, have been neglected in related research fields. In this study, real-time monitoring data for chemical oxygen demand (COD), ammonia nitrogen (NH3-N), pH, and dissolved oxygen in the wastewater discharged from 2213 factories and in the surface water at 18 monitoring sections (sites) in 7 administrative regions in the Yangtze River Basin from 2016 to 2017 were collected and analyzed by the partitioning around medoids (PAM) and expectation–maximization (EM) clustering algorithms, Welch t-test, Wilcoxon test, and Spearman correlation. The results showed that compared with the spatial cluster comprising unpolluted sites, the spatial cluster comprised heavily polluted sites where more wastewater was discharged had relatively high COD (>100 mg L−1) and NH3-N (>6 mg L−1) concentrations and relatively low pH (<6) from 15 industrial classes that respected the different discharge limits outlined in the pollutant discharge standards. The results also showed that the economic activities generating wastewater and the geographical distribution of the heavily polluted wastewater changed from 2016 to 2017, such that the concentration ranges of pollutants in discharges widened and the contributions from some emerging enterprises became more important. The correlations between the quality of the wastewater and the surface water strengthened as the whole-year data sets were reduced to the heavily polluted periods by the EM clustering and water quality evaluation. This study demonstrates how unsupervised machine learning algorithms play an objective and effective role in data mining real-time monitoring information and highlighting spatio–temporal relationships between pollutants in wastewater discharges and surface water to support scientific water resource management. View Full-Text
Keywords: partitioning around medoids clustering algorithm; expectation–maximization clustering algorithm; point pollution sources; sewage outlets; real-time monitoring data; correlation relationship partitioning around medoids clustering algorithm; expectation–maximization clustering algorithm; point pollution sources; sewage outlets; real-time monitoring data; correlation relationship
Show Figures

Graphical abstract

MDPI and ACS Style

Di, Z.; Chang, M.; Guo, P.; Li, Y.; Chang, Y. Using Real-Time Data and Unsupervised Machine Learning Techniques to Study Large-Scale Spatio–Temporal Characteristics of Wastewater Discharges and their Influence on Surface Water Quality in the Yangtze River Basin. Water 2019, 11, 1268.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop