Groundwater arsenic in Uruguay is an important environmental hazard, hence, predicting its distribution is important to inform stakeholders. Furthermore, occurrences in Uruguay are known to variably show dependence on depth and geology, arguably reflecting different processes controlling groundwater arsenic concentrations. Here, we present the distribution of groundwater arsenic in Uruguay modelled by a variety of machine learning, basic expert systems, and hybrid approaches. A pure random forest approach, using 26 potential predictor variables, gave rise to a groundwater arsenic distribution model with a very high degree of accuracy (AUC = 0.92), which is consistent with known high groundwater arsenic hazard areas. These areas are mainly in southwest Uruguay, including the Paysandú, Río Negro, Soriano, Colonia, Flores, San José, Florida, Montevideo, and Canelones departments, where the Mercedes, Cuaternario Oeste, Raigón, and Cretácico main aquifers occur. A hybrid approach separating the country into sedimentary and crystalline aquifer domains resulted in slight material improvement in a high arsenic hazard distribution. However, a further hybrid approach separately modelling shallow (<50 m) and deep aquifers (>50 m) resulted in the identification of more high hazard areas in Flores, Durazno, and the northwest corner of Florida departments in shallow aquifers than the pure model. Both hybrid models considering depth (AUC = 0.95) and geology (AUC = 0.97) produced improved accuracy. Hybrid machine learning models with expert selection of important environmental parameters may sometimes be a better choice than pure machine learning models, particularly where there are incomplete datasets, but perhaps, counterintuitively, this is not always the case.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited