Next Article in Journal
The Use of Cool Pavements for the Regeneration of Industrial Districts
Next Article in Special Issue
Predictive Modeling Approach for Surface Water Quality: Development and Comparison of Machine Learning Models
Previous Article in Journal
“Authenticity” as a Pathway to Sustainable Cultural Tourism? The Cases of Gotland and Rapa Nui
Previous Article in Special Issue
Source Apportionment of Inorganic Solutes in Surface Waters of Lake Baikal Watershed
Article

Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach

1
Instituto de Mecánica de los Fluidos e Ingeniería Ambiental (IMFIA), Facultad de Ingeniería, Universidad de la República, Montevideo 11300, Uruguay
2
Instituto de Computación (InCo), Facultad de Ingeniería, Universidad de la República, Montevideo 11300, Uruguay
*
Author to whom correspondence should be addressed.
Academic Editor: Ashwani Kumar Tiwari
Sustainability 2021, 13(11), 6318; https://doi.org/10.3390/su13116318
Received: 3 May 2021 / Revised: 30 May 2021 / Accepted: 1 June 2021 / Published: 2 June 2021
(This article belongs to the Special Issue Water Quality: Current State and Future Trends)
The monitoring of surface-water quality followed by water-quality modeling and analysis are essential for generating effective strategies in surface-water-resource management. However, worldwide, particularly in developing countries, water-quality studies are limited due to the lack of a complete and reliable dataset of surface-water-quality variables. In this context, several statistical and machine-learning models were assessed for imputing water-quality data at six monitoring stations located in the Santa Lucía Chico river (Uruguay), a mixed lotic and lentic river system. The challenge of this study is represented by the high percentage of missing data (between 50% and 70%) and the high temporal and spatial variability that characterizes the water-quality variables. The competing algorithms implement univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Hubber Regressor (HR), Support Vector Regressor (SVR) and K-nearest neighbors Regressor (KNNR)). According to the results, more than 76% of the imputation outcomes are considered “satisfactory” (NSE > 0.45). The imputation performance shows better results at the monitoring stations located inside the reservoir than those positioned along the mainstream. IDW was the model with the best imputation results, followed by RFR, HR and SVR. The approach proposed in this study is expected to aid water-resource researchers and managers in augmenting water-quality datasets and overcoming the missing data issue to increase the number of future studies related to the water-quality matter. View Full-Text
Keywords: data scarcity; water quality; missing data; univariate imputation; multivariate imputation; machine learning; hydroinformatics data scarcity; water quality; missing data; univariate imputation; multivariate imputation; machine learning; hydroinformatics
Show Figures

Figure 1

MDPI and ACS Style

Rodríguez, R.; Pastorini, M.; Etcheverry, L.; Chreties, C.; Fossati, M.; Castro, A.; Gorgoglione, A. Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach. Sustainability 2021, 13, 6318. https://doi.org/10.3390/su13116318

AMA Style

Rodríguez R, Pastorini M, Etcheverry L, Chreties C, Fossati M, Castro A, Gorgoglione A. Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach. Sustainability. 2021; 13(11):6318. https://doi.org/10.3390/su13116318

Chicago/Turabian Style

Rodríguez, Rafael, Marcos Pastorini, Lorena Etcheverry, Christian Chreties, Mónica Fossati, Alberto Castro, and Angela Gorgoglione. 2021. "Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach" Sustainability 13, no. 11: 6318. https://doi.org/10.3390/su13116318

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop