# Bayesian Machine Learning and Functional Data Analysis as a Two-Fold Approach for the Study of Acid Mine Drainage Events

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Fluvial System Description and Data Acquisition

^{3}/s), rainfall (mm), and temperature ($\xb0\mathrm{C}$), which can affect the biological properties of the aquatic system, as well as the behavior of other substances dissolved in the water.

#### 2.2. Mathematical Background

#### 2.2.1. Statistical Process Control

#### 2.2.2. Bayesian Networks

#### 2.2.3. Functional Data Analysis

## 3. Results

#### 3.1. Variability of the Data and Outlier Detection with SPC

#### 3.2. Variables Influence Analysis in pH Distribution

#### 3.3. Functional Analysis Approach

^{3}/s, with a maximum of 67.37 m

^{3}/s on 20 December 2019, and a minimum of 0.46 on 9 October 2017, with most outliers taking place between December and March of every year.

^{3}/s and 24.71 m

^{3}/s. Consequently, their values are considerably above the 3rd quartile of the flow data, which is 6.53 m

^{3}/s, evidencing their outlying behavior. This information points towards the fact that the pH level can decrease, but when it reaches a certain point, the water flow loses its influence. In other words, there are certain scenarios when an increase in the flow level will not necessarily imply a further drop in the pH.

## 4. Discussion

^{3}/s.

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Simate, G.S.; Ndlovu, S. Acid Mine Drainage: Challenges and Opportunities. J. Environ. Chem. Eng.
**2014**, 2, 1785–1803. [Google Scholar] [CrossRef] - Akcil, A.; Koldas, S. Acid Mine Drainage (AMD): Causes, Treatment and Case Studies. J. Clean. Prod.
**2006**, 14, 1139–1145. [Google Scholar] [CrossRef] - Monterroso, C.; Macías, F. Drainage Waters Affected by Pyrite Oxidation in a Coal Mine in Galicia (NW Spain): Composition and Mineral Stability. Sci. Total Environ.
**1998**, 216, 121–132. [Google Scholar] [CrossRef] - Tiwary, R.K. Environmental Impact of Coal Mining on Water Regime and Its Management. Water. Air. Soil Pollut.
**2001**, 132, 185–199. [Google Scholar] [CrossRef] - Campaner, V.P.; Luiz-Silva, W.; Machado, W. Geochemistry of Acid Mine Drainage from a Coal Mining Area and Processes Controlling Metal Attenuation in Stream Waters, Southern Brazil. An. Acad. Bras. Cienc.
**2014**, 86, 539–554. [Google Scholar] [CrossRef] [PubMed] - Alhamed, M.; Wohnlich, S. Environmental Impact of the Abandoned Coal Mines on the Surface Water and the Groundwater Quality in the South of Bochum, Germany. Environ. Earth Sci.
**2014**, 72, 3251–3267. [Google Scholar] [CrossRef] - Kicińska, A.; Pomykała, R.; Izquierdo-Diaz, M. Changes in Soil PH and Mobility of Heavy Metals in Contaminated Soils. Eur. J. Soil Sci.
**2022**, 73, e13203. [Google Scholar] [CrossRef] - Nordstrom, D.K. Hydrogeochemical Processes Governing the Origin, Transport and Fate of Major and Trace Elements from Mine Wastes and Mineralized Rock to Surface Waters. Appl. Geochem.
**2011**, 26, 1777–1791. [Google Scholar] [CrossRef] - Kim, J.J.; Kim, S.J. Seasonal Factors Controlling Mineral Precipitation in the Acid Mine Drainage at Donghae Coal Mine, Korea. Sci. Total Environ.
**2004**, 325, 181–191. [Google Scholar] [CrossRef] - Masindi, V. Recovery of Drinking Water and Valuable Minerals from Acid Mine Drainage Using an Integration of Magnesite, Lime, Soda Ash, CO2 and Reverse Osmosis Treatment Processes. J. Environ. Chem. Eng.
**2017**, 5, 3136–3142. [Google Scholar] [CrossRef] - Wright, I.A.; Paciuszkiewicz, K.; Belmer, N. Increased Water Pollution After Closure of Australia′s Longest Operating Underground Coal Mine: A 13-Month Study of Mine Drainage, Water Chemistry and River Ecology. Water Air. Soil Pollut.
**2018**, 229, 55. [Google Scholar] [CrossRef] - Hobbs, P.; Oelofse, S.H.H.; Rascher, J. Management of Environmental Impacts from Coal Mining in the Upper Olifants River Catchment as a Function of Age and Scale. Int. J. Water Resour. Dev.
**2008**, 24, 417–431. [Google Scholar] [CrossRef] - MITECO SAIH Network. Available online: https://www.miteco.gob.es/es/agua/temas/evaluacion-de-los-recursos-hidricos/SAIH/ (accessed on 8 November 2022).
- MITECO SAICA Network. Available online: https://www.miteco.gob.es/es/agua/temas/estado-y-calidad-de-las-aguas/aguas-superficiales/programas-seguimiento/saica.aspx (accessed on 12 November 2022).
- Yaroshenko, I.; Kirsanov, D.; Marjanovic, M.; Lieberzeit, P.A.; Korostynska, O.; Mason, A.; Frau, I.; Legin, A. Real-Time Water Quality Monitoring with Chemical Sensors. Sensors
**2020**, 20, 3432. [Google Scholar] [CrossRef] [PubMed] - Sambito, M.; Freni, G. Strategies for Improving Optimal Positioning of Quality Sensors in Urban Drainage Systems for Non-Conservative Contaminants. Water
**2021**, 13, 934. [Google Scholar] [CrossRef] - Ajami, N.K.; Hornberger, G.M.; Sunding, D.L. Sustainable Water Resource Management under Hydrological Uncertainty. Water Resour. Res.
**2008**, 44, W11406. [Google Scholar] [CrossRef] [Green Version] - Ovaskainen, O.; Tikhonov, G.; Norberg, A.; Guillaume Blanchet, F.; Duan, L.; Dunson, D.; Roslin, T.; Abrego, N. How to Make More out of Community Data? A Conceptual Framework and Its Implementation as Models and Software. Ecol. Lett.
**2017**, 20, 561–576. [Google Scholar] [CrossRef] [Green Version] - Gokdemir, C.; Li, Y.; Rubin, Y.; Li, X. Stochastic Modeling of Groundwater Drawdown Response Induced by Tunnel Drainage. Eng. Geol.
**2022**, 297, 106529. [Google Scholar] [CrossRef] - Ramsay, J.O.; Silverman, B.W. Functional Data Analysis, 2nd ed.; Springer New York LLC: New York, NY, USA, 2005; ISBN 978-0-387-40080-8. [Google Scholar]
- Febrero, M.; Galeano, P.; Gonz, W. Outlier Detection in Functional Data by Depth Measures, with Application to Identify Abnormal NO
_{x}Levels. Environmetrics**2008**, 19, 331–345. [Google Scholar] [CrossRef] - Sancho, J.; Martínez, J.; Pastor, J.J.; Taboada, J.; Piñeiro, J.I.; García-Nieto, P.J. New Methodology to Determine Air Quality in Urban Areas Based on Runs Rules for Functional Data. Atmos. Environ.
**2014**, 83, 185–192. [Google Scholar] [CrossRef] - Sancho, J.; Iglesias, C.; Piñeiro, J.; Martínez, J.; Pastor, J.J.; Araújo, M.; Taboada, J. Study of Water Quality in a Spanish River Based on Statistical Process Control and Functional Data Analysis. Math. Geosci.
**2016**, 48, 163–186. [Google Scholar] [CrossRef] - Sancho, J.; Pastor, J.J.; Martínez, J.; García, M.A. Evaluation of Harmonic Variability in Electrical Power Systems through Statistical Control of Quality and Functional Data Analysis. Procedia Eng.
**2013**, 63, 295–302. [Google Scholar] [CrossRef] [Green Version] - Martínez Torres, J.; Pastor Pérez, J.; Sancho Val, J.; McNabola, A.; Martínez Comesaña, M.; Gallagher, J. A Functional Data Analysis Approach for the Detection of Air Pollution Episodes and Outliers: A Case Study in Dublin, Ireland. Mathematics
**2020**, 8, 225. [Google Scholar] [CrossRef] [Green Version] - Martínez Torres, J.; Garcia Nieto, P.J.; Alejano, L.; Reyes, A.N. Detection of Outliers in Gas Emissions from Urban Areas Using Functional Data Analysis. J. Hazard. Mater.
**2011**, 186, 144–149. [Google Scholar] [CrossRef] - Ordóñez, C.; Martínez, J.; de Cos Juez, J.F.; Sánchez Lasheras, F. Comparison of GPS Observations Made in a Forestry Setting Using Functional Data Analysis. Int. J. Comput. Math.
**2012**, 89, 402–408. [Google Scholar] [CrossRef] - Gorde, S.P.; Jadhav, M.V. Assessment of Water Quality Parameters: A Review. Int. J. Eng. Res. Appl.
**2013**, 3, 2029–2035. [Google Scholar] - Kitchener, B.G.B.; Wainwright, J.; Parsons, A.J. A Review of the Principles of Turbidity Measurement. Prog. Phys. Geogr.
**2017**, 41, 620–642. [Google Scholar] [CrossRef] [Green Version] - Ribeiro, J.; Ferreira da Silva, E.; Li, Z.; Ward, C.; Flores, D. Petrographic, Mineralogical and Geochemical Characterization of the Serrinha Coal Waste Pile (Douro Coalfield, Portugal) and the Potential Environmental Impacts on Soil, Sediments and Surface Waters. Int. J. Coal Geol.
**2010**, 83, 456–466. [Google Scholar] [CrossRef] - Shewhart, W.A. Economic Control of Quality of Manufactured Product; Van Nostrand Company, Inc.: New York, NY, USA, 1931. [Google Scholar]
- Champ, C.W.; Woodall, W.H. Exact Results for Shewhart Control Charts with Supplementary Runs Rules. Technometrics
**1987**, 29, 393–399. [Google Scholar] [CrossRef] - Zhang, S.; Wu, Z. Designs of Control Charts with Supplementary Runs Rules. Comput. Ind. Eng.
**2005**, 49, 76–97. [Google Scholar] [CrossRef] - Nelson, L.S. The Shewhart Control Chart—Tests for Special Causes. J. Qual. Technol.
**1984**, 16, 237–239. [Google Scholar] [CrossRef] - Electric, W. Statistical Quality Control Handbook; Western Electric Corporation: Indianapolis, Indiana, 1956. [Google Scholar]
- Conrady, S.; Jouffe, L. Bayesian Networks and BayesiaLab—A Practical Introduction for Researchers, 1st ed.; Bayesia USA: Franklin, TN, USA, 2015; ISBN 0996533303. [Google Scholar]
- S.A.S., B. BayesiaLab 2022. Available online: https://www.bayesia.com/articles/#!bayesialab-knowledge-hub/2022-bayesialab-conference (accessed on 20 January 2023).
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] [Green Version] - Radicchi, F.; Krioukov, D.; Hartle, H.; Bianconi, G. Classical Information Theory of Networks. J. Phys. Complex.
**2020**, 1, 25001. [Google Scholar] [CrossRef] - S.A.S., B. Contingency Table Fit. Available online: https://www.bayesia.com/articles/#!bayesialab-knowledge-hub/key-concepts-contingency-table-fit (accessed on 22 January 2023).
- Ramsay, J.O.; Silverman, B.W. Functional Data Analysis, 1st ed.; Springer International Publishing: New York, NY, USA, 2002; ISBN 9781461271666. [Google Scholar]
- Di Blasi, J.I.P.; Martínez Torres, J.; García Nieto, P.J.; Alonso Fernández, J.R.; Díaz Muñiz, C.; Taboada, J. Analysis and Detection of Outliers in Water Quality Parameters from Different Automated Monitoring Stations in the Miño River Basin (NW Spain). Ecol. Eng.
**2013**, 60, 60–66. [Google Scholar] [CrossRef] - Díaz Muñiz, C.; García Nieto, P.J.; Alonso Fernández, J.R.; Martínez Torres, J.; Taboada, J. Detection of Outliers in Water Quality Monitoring Samples Using Functional Data Analysis in San Esteban Estuary (Northern Spain). Sci. Total Environ.
**2012**, 439, 54–61. [Google Scholar] [CrossRef] [PubMed] - Martínez, J.; Saavedra, Á.; García-Nieto, P.J.; Piñeiro, J.I.; Iglesias, C.; Taboada, J.; Sancho, J.; Pastor, J. Air Quality Parameters Outliers Detection Using Functional Data Analysis in the Langreo Urban Area (Northern Spain). Appl. Math. Comput.
**2014**, 241, 1–10. [Google Scholar] [CrossRef] - Lopez-Pintado, S.; Romo, J. On the Concept of Depth for Functional Data. J. Am. Stat. Assoc.
**2009**, 104, 718–734. [Google Scholar] [CrossRef] [Green Version] - Rigueira, X.; Araújo, M.; Martínez, J.; García-Nieto, P.J.; Ocarranza, I. Functional Data Analysis for the Detection of Outliers and Study of the Effects of the COVID-19 Pandemic on Air Quality: A Case Study in Gijón, Spain. Mathematics
**2022**, 10, 2374. [Google Scholar] [CrossRef] - Ojo, O.; Lillo, R.E.; Anta, A.F. Outlier Detection for Functional Data with R Package Fdaoutlier. arXiv
**2021**. [Google Scholar] [CrossRef] - Dai, W.; Genton, M.G. Multivariate Functional Data Visualization and Outlier Detection. J. Comput. Graph. Stat.
**2018**, 27, 923–934. [Google Scholar] [CrossRef] [Green Version] - Ministerio del Ambiente, Agua y Transición Ecológica. Real Decreto 817/2015, de 11 de septiembre, Por El Que Se Establecen Los Criterios de Seguimiento y Evaluación Del Estado de Las Aguas Superficiales y Las Normas de Calidad Ambiental. Available online: https://www.boe.es/eli/es/rd/2015/09/11/817 (accessed on 22 January 2023).

**Figure 1.**The geographical location of Fabero marked with a yellow dot, the coal mine in red, positioned to the east of Fabero, the water control station in green, and the two main rivers in the area: Cúa and Rioseco in blue.

**Figure 2.**pH data represented in the $\stackrel{-}{x}$ control chart with weekly rational subgroups and the 8 Nelson rules implemented for outlier detection, trend study, and variability analysis. The mean of all subgroups is represented by a black line, while the green, yellow, and red lines mark the $\pm 1\sigma ,\pm 2\sigma $, and $\pm 3\sigma $ limits, respectively.

**Figure 3.**Flow data is represented in the $\stackrel{-}{x}$ control chart with weekly rational subgroups and the Nelson rules implemented for outlier detection, trend study, and variability analysis. The mean of al subgroup is represented by a black line, while the green, yellow, and red lines mark the $\pm 1\sigma ,\pm 2\sigma $, and $\pm 3\sigma $ limits, respectively.

**Figure 4.**Supervised BN built with Augmented Naïve Bayes algorithm. The graph presents a radial layout with the target node (pH) in the center. The color of the nodes represents the type of variable: chemical, physical, or temporal. The values of the arcs correspond to the RMI (%) analyses, and the values of the nodes are the variable contributions to the target node (%).

**Figure 5.**Results of the functional analysis on the water flow data. On the left side, a Cartesian representation of the pair of values magnitude–shape of each function is presented. The right side shows the functional plot of the weekly water flow values. Outliers are marked in red in both plots, and nonoutliers are colored in blue.

**Figure 6.**Results of the functional analysis on the pH data. On the left side, a Cartesian representation of the pair of values magnitude–shape of each function is presented. The right side shows the functional plot of the weekly pH values. Outliers are marked in red in both plots, and nonoutliers are colored in blue.

**Figure 7.**Outlier correlation analysis between variables in the database. The horizontal axis contains the number of outlying weeks that are coincidental in time between the variables studied in each case, while the vertical axis of the plot represents the scaled mean values between 1 and 0 of their corresponding variable in each matching week: (

**a**) plot of the relationship analysis between flow and pH; (

**b**) plot of the relationship analysis between rainfall and pH; (

**c**) plot of the relationship analysis between conductivity and pH; (

**d**) plot of the relationship between the turbidity and pH; (

**e**) plot of the relationship analysis between temperature and pH; (

**f**) plot of the relationship analysis between temperature and dissolved oxygen; and (

**g**) plot of the relationship analysis between rainfall, flow, and turbidity.

**Figure 8.**Risk assessment of an increase in the flow of the Cúa River on the pH and turbidity variables. The percentage distribution of the variables before the inferential analysis is based on an initial scenario where the average behavior of the river is reflected. The four ranges of pH, flow, and turbidity are those obtained from the control graphs $\stackrel{-}{x}$.

Conductivity (µS/cm) |

$\le 106.89(x\le \stackrel{-}{x}$ − 1σ) | $\le 204.04(\stackrel{-}{x}$ − 1σ,$\stackrel{-}{x}$] | $\le 301.2(\stackrel{-}{x}$,$\stackrel{-}{x}$ + 1σ] | $>301.2(x$$\stackrel{-}{x}$+ 1σ) |

Dissolved oxygen (mg/L) |

$\le 8.81(x\le \stackrel{-}{x}$ − 2σ] | $\le 9.79(\stackrel{-}{x}$− 2σ,$\stackrel{-}{x}$ − 1σ] | $\le 11.7(\stackrel{-}{x}$ − 1σ,$\stackrel{-}{x}$+ 1σ] | $>11.7(x$$\stackrel{-}{x}$+ 1σ) |

pH (u. pH) |

≤5.5 [49]$|\le 6.5(5.5$,$\stackrel{-}{x}$ − 2σ] | $\le 7.1(\stackrel{-}{x}$ − 2σ,$\stackrel{-}{x}$] | $>7.1(x$$\stackrel{-}{x}$) |

Water Temperature (°C) |

$\le 6.73(x\le \stackrel{-}{x}$ − 1σ] | $\le 10.07(\stackrel{-}{x}$− 1σ,$\stackrel{-}{x}$] | $\le 13.4(\stackrel{-}{x}$,$\stackrel{-}{x}$ + 1σ] | $>13.4(x\stackrel{-}{x}$ + 1σ) |

Turbidity (NTU) |

$\le 4.28(x\le \stackrel{-}{x})$$|\le 11.35(\stackrel{-}{x}$$,\stackrel{-}{x}$+ 1σ) | $\le 18.42(\stackrel{-}{x}$+ 1σ$,\stackrel{-}{x}$+ 2σ$]|18.42(x\stackrel{-}{x}$ + 2σ) |

Rainfall (mm) |

$\le 2.29(x\le \stackrel{-}{x}$) |$\le 5.68(\stackrel{-}{x}$,$\stackrel{-}{x}$+ 1σ] | $\le 9.07(\stackrel{-}{x}$+ 1σ,$\stackrel{-}{x}$+ 2σ] | $>9.07(x$$\stackrel{-}{x}$+ 2σ) |

Temperature (°C) |

$\le 5(x\le \stackrel{-}{x}$ − 1σ] | ≤ 11.15 °$\mathrm{C}(\stackrel{-}{x}$ − 1σ,$\stackrel{-}{x}$] | ≤ 17.3 °$\mathrm{C}(\stackrel{-}{x}$,$\stackrel{-}{x}$ + 1σ] | > 17.3 °$\mathrm{C}(x$$\stackrel{-}{x}$+ 1σ) |

Water flow (m^{3}/s) |

$\le 5.54(x\le \stackrel{-}{x}$) | $\le 11.38(\stackrel{-}{x}$,$\stackrel{-}{x}$+ 1σ] | $\le 17.22(\stackrel{-}{x}$+ 1σ,$\stackrel{-}{x}$+ 2σ] | $>17.22(x$$\stackrel{-}{x}$+ 2σ) |

**Table 2.**Local impact analysis with pH target states. Results of the Relative Binary Mutual Information (RBMI) calculated.

≤5.5 ^{1} | (5.5, 6.5] | (6.5, 7.1] | (x > 7.1) | |
---|---|---|---|---|

Conductivity | 5.271% | 26.418% | 10.594% | 20.361% |

Flow | 17.072% | 27.955% | 11.091% | 19.603% |

Month | 15.260% | 21.067% | 7.699% | 11.168% |

Turbidity | 16.980% | 23.783% | 3.183% | 8.707% |

Water Tª | 7.268% | 9.921% | 3.614% | 5.945% |

Temperature | 7.907% | 10.968% | 3.494% | 5.793% |

Rainfall | 10.635% | 9.411% | 0.672% | 2.411% |

Dissolved oxygen | 4.291% | 4.961% | 0.919% | 1.629% |

^{1}A pH level of 5.5 is the limit defined by Spanish legislation [49] for the change of class condition from Good/Moderate to Moderate/Deficient for the river type: R-T31 small Cantabrian–Atlantic siliceous axes.

**Table 3.**Results of the functional analysis on all variables. The first column contains the name of each variable. Columns 2 to 4 include information on the minimum, average, and maximum values for their respective variables. Lastly, column 5 contains the number of weeks identified as outliers in each variable out of the 620 analyzed.

Variable | Min. | Avg. | Max. | Outliers |
---|---|---|---|---|

Rainfall | 0 Several days | 2.61 | 93.2 10 December 2017 | 84 |

pH | 4.3 5 February 2017 | 7.07 | 8.0 Several days | 84 |

Flow | 0.46 9 October 2017 | 5.35 | 67.37 20 December 2019 | 83 |

Conductivity | 21.8 13 December 2020 | 204.10 | 642.2 14 October 2011 | 81 |

Temperature | −4.0 8 January 2021 | 11.26 | 26.0 17 June 2017 | 83 |

Dissolved oxygen | 7.58 11 November 2016 | 10.76 | 13.91 8 November 2016 | 82 |

Turbidity | 0 Several days | 4.27 | 199.90 6 May 2012 | 81 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Rigueira, X.; Pazo, M.; Araújo, M.; Gerassis, S.; Bocos, E.
Bayesian Machine Learning and Functional Data Analysis as a Two-Fold Approach for the Study of Acid Mine Drainage Events. *Water* **2023**, *15*, 1553.
https://doi.org/10.3390/w15081553

**AMA Style**

Rigueira X, Pazo M, Araújo M, Gerassis S, Bocos E.
Bayesian Machine Learning and Functional Data Analysis as a Two-Fold Approach for the Study of Acid Mine Drainage Events. *Water*. 2023; 15(8):1553.
https://doi.org/10.3390/w15081553

**Chicago/Turabian Style**

Rigueira, Xurxo, María Pazo, María Araújo, Saki Gerassis, and Elvira Bocos.
2023. "Bayesian Machine Learning and Functional Data Analysis as a Two-Fold Approach for the Study of Acid Mine Drainage Events" *Water* 15, no. 8: 1553.
https://doi.org/10.3390/w15081553