Next Article in Journal
Performance and Economic Evaluation of Low-Lignin Alfalfa ‘Hi-Gest® 360’ in Saskatchewan Canada
Previous Article in Journal
Baseline Sensitivity to and Succinate Dehydrogenase Activity and Molecular Docking of Fluxapyroxad and SYP-32497 in Rice Sheath Blight (Rhizoctonia solani AG1-IA) in China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning as a Diagnosis Tool of Groundwater Quality in Zones with High Agricultural Activity (Region of Campo de Cartagena, Murcia, Spain)

by
Eva M. García-del-Toro
1,*,
Sara García-Salgado
1,
Luis F. Mateo
2,
M. Ángeles Quijano
1 and
M. Isabel Más-López
2
1
Departamento de Ingeniería Civil, Hidráulica, Energía y Medio Ambiente, ETSI Caminos, Canales y Puertos-Edificio Retiro, Universidad Politécnica de Madrid, Alfonso XII, 3, 28014 Madrid, Spain
2
Departamento de Matemática e Informática Aplicadas a las Ingenierías Civil y Naval, ETSI Caminos, Canales y Puertos-Edificio Retiro, Universidad Politécnica de Madrid, Alfonso XII, 3, 28014 Madrid, Spain
*
Author to whom correspondence should be addressed.
Agronomy 2022, 12(12), 3076; https://doi.org/10.3390/agronomy12123076
Submission received: 26 October 2022 / Revised: 28 November 2022 / Accepted: 2 December 2022 / Published: 5 December 2022
(This article belongs to the Section Water Use and Irrigation)

Abstract

:
Groundwater is humanity’s freshwater pantry, constituting 97% of available freshwater. The 6th Sustainable Development Goal (SDG) of the UN Agenda 2030 promotes “Ensure availability and sustainable management of water and sanitation for all”, which takes special significance in arid or semi-arid regions. The region of Campo de Cartagena (Murcia, Spain) has one of the most technified and productive irrigation systems in Europe. As a result, the groundwater in this zone has serious chemical quality problems. To qualify and predict groundwater quality of this region, which may later facilitate its management, two machine learning models (Naïve-Bayes and Decision-tree) are proposed. These models did not require great computing power and were developed from a reduced number of data using the KNIME (KoNstanz Information MinEr) tool. Their accuracy was tested by the corresponding confusion matrix, providing a high accuracy in both models. The obtained results showed that groundwater quality was higher in the northern and west zones. This may be due to the presence in the north of the Andalusian aquifer, the deepest in Campo de Cartagena, and in the west to the predominance of rainfed crops, where the amount of water available for leaching fertilizers is lower, coming mainly from rainfall.

1. Introduction

Groundwater accounts for about 97% of freshwater reserves in the Earth; regardless of the polar ice caps and glaciers, and being of the most important water resources on the planet [1]. Its relevance has traditionally been linked to supply tasks such as drinking water, but also plays a key role in a large number of industrial processes, such as cooling process water, as well as in the agricultural and livestock sector [2]. Likewise, it is a natural resource of incalculable value, since it is essential in the hydrological cycle, actively contributing to the existence of a water balance between groundwater and surface water, and cushioning the negative effects of droughts [3], especially crucial in arid or semi-arid zones, which have a smaller amount of exploitable surface water resources [4]. In addition, the discharge and recharge cycles of aquifers allow the existence of a multitude of rivers, streams, ponds, and wetlands, which favor biodiversity and enable the maintenance of multiple ecosystems [5]. Considering all this, it is important to highlight the importance of preserving the quality of groundwaters, since, because of climate change, rainfall is becoming increasingly scarce and irregular, and affecting the countries of the Mediterranean basin.
One of the main pollutants in groundwater is nitrate, due to its high solubility in water and the difficulty to be fixed in soils [6]. In nature, nitrogen compounds are frequently found in soils from natural sources, such as the weathering of igneous rocks, atmospheric deposition, and bacterial fixation of atmospheric nitrogen, which is transformed into ammonium, and the latter into nitrite and finally nitrate, through the nitrogen cycle. Nitrate is stable under oxic conditions and can remain in the aquifer for a long time [7]. In recent years, the presence of nitrate in groundwater has increased exponentially, exceeding the established guideline levels of 50 mg L−1 according to the World Health Organization (WHO) [8], especially in many African and Asian zones, but also in Europe in terms of percentage [9]. This contamination is caused by the discharge of poorly treated domestic and industrial wastewater, livestock manure, leachates from landfills [7,9], as well as by the use of nitrogen fertilizers, especially in zones where agriculture is highly developed and technified [10,11]. On the contrary, in developing countries, the main source of nitrate in groundwater is the lack of adequate sanitation and water purification methods [11].
In the region of Campo de Cartagena, the availability of the water supply is irregular, with very critical situations due to periods of drought, as it has been the case in recent irrigation campaigns [12]. In addition, it has one of the most technified and productive irrigation systems in Europe, with an important agri-food industry associated with a great urban development in the region, due to tourism [13]. All this has contributed to very intensive use of groundwater. In addition to scarcity, groundwater in Campo de Cartagena has serious chemical quality problems related, above all, to high salinity and nitrate content of agricultural origin, mainly from the high agricultural activity in the zones [14,15].
The 6th Sustainable Development Goal (SDG) promulgated by the UN [16] promotes “Ensure availability and sustainable management of water and sanitation for all”. Therefore, it is necessary to use effective tools to help diagnose, manage, and conserve groundwater, especially in arid or semi-arid zones, where it is an indispensable resource for the population and its economic development. Its management usually involves sampling and measuring the quality of these waters, which in many cases, is a very slow and costly process. Therefore, it is essential to carry out theoretical modeling to predict their state and future evolution.
In the field of improving agricultural productivities [17] and groundwater pollution modeling, especially by nitrate, several studies have been carried out in recent years based on the use of machine learning models, which greatly facilitate both the study of groundwater quality and its subsequent management. Gholami and Booij (2022) [18] used three machine learning methods (Deep Neural Network (DNN), Extreme Gradient Boosting (EGB), and Multiple Linear Regression (MLR)) to predict nitrate contamination in groundwater in the Mazandaran plain (northern Iran). The GBS method provided the best results. Among the parameters considered, obtained through GIS, distance from industries, population density, and depth to groundwater and evaporation rates were the most important. Awais et al. (2021) [19] used several machine learning approaches to assess nitrate pollution risks along the Karakoram Highway (Iran). They used Support Vector Machine (SVM), Multivariate Discriminant Analysis (MDA), and Boosted Regression Trees (BRT) in the modeling process. The results allowed the creation of nitrate pollution vulnerability maps, where the best results were achieved by combining those obtained by the three models. El Bilali et al. (2021) [20] used different machine learning methods in the prediction of groundwater quality, in order to be used in irrigation processes. The studies were carried out in the Berrechid aquifer, located in NW Morocco, on 520 water samples and 14 physico-chemical parameters. The results showed better predictions from the Adaptive Boosting (Adaboost) and Random Forest (RF) models, although the Artificial Neural Network (ANN) and Support Vector Regression (SVR) models were less sensitive to the input variables and therefore more generalizable and useful at the global level. Alkindi et al. (2021) [21] evaluated the efficiency of different Bayesian artificial intelligence methods to predict nitrate concentration in groundwater in the Marvdasht basin (Fars, Iran). Between the 11 parameters used in the study, the results obtained showed that the potassium level had the greatest effect on nitrate contamination, followed by rainfall, altitude, depth to groundwater, and distance to the residential zones. The Bayesian Additive Regression Tree (BART) model was the most efficient.
The main objective of this work was to qualify and predict the quality of groundwater in the region Campo de Cartagena, in order to identify the zones most affected by the use of fertilizers and saline intrusion, which may later facilitate its management. For this purpose, two supervised classification machine learning models were used, Naïve-Bayes and Decision-tree, using the analytical platform KNIME (KoNstanz Information MinEr). The accuracy of the results obtained by both models was tested, by means of the corresponding confusion matrix. Comparison of the results obtained with the map of crops and uses of Campo de Cartagena allowed us to evaluate the usefulness of the theoretical models proposed for the qualification of groundwater quality in the studied zones and further management.

2. Materials and Methods

2.1. Location of the Studied Zone

Campo de Cartagena is a natural region of Murcia (Spain). It is located in the southeast of the Iberian Peninsula, forming a wide plain of 1240 km2 that extends from the Sierra de Carrasco (37.82222 N, −1.28833 W) to the Mediterranean Sea (37.60889 N, −0.71917 W), where it has a slight slope. This region includes protected natural zones, such as the Regional Park of Las Salinas, the Arenales de San Pedro del Pinatar, the Protected Landscapes of Cabezo Gordo, and the Open Spaces and Islands of the Mar Menor. The region has a semi-arid climate with an average temperature of 17.5 °C and an average annual rainfall of 300 mm of uneven distribution, since it generally corresponds to intense episodes during spring and autumn. The predominant land use in 2000 was agricultural (76%), urban (9%), and forestry (15%).
According to the climatic atlas of Spain, the studied area has an arid climate type “Bwh warm desert” [22]. This kind of climate is located in small areas in the southeast of the Iberian Peninsula, in the Spanish provinces of Almeria, Murcia, and Alicante. The average annual temperature is 18 °C, with total annual rainfall of 312 mm, with irregular distribution. A climograph of Campo de Cartagena, Murcia is shown in Figure 1.
The hydrogeological system is constituted by several layers with aquifers made up of detrital sediments, sandstone, and limestone. Three aquifers stand out for their extension and use, separated by aquitards generally composed of marl and clay. The Quaternary unconfined aquifer, composed mainly of sand, conglomerates, and sandstone, with inserted silt and clay, is the most superficial and communicates with the Mar Menor. It occupies the largest area of the Campo de Cartagena (1135 km2) and is recharged by vertical infiltration from precipitation and agricultural irrigation, which causes its contamination by the presence of fertilizers such as nitrate. The intermediate Pliocene aquifer is at greater depth, composed mainly of sandstone, whereas the deep Andalusian aquifer (upper Miocene) is composed of conglomerates, with inserted sand, marl, and bioclastic limestone. These deeper aquifers occupy an area of 817 and 570 km2, respectively, although the latter is thickest in the north. The deep confined aquifers are fed by the infiltration of useful rain, but also by communication with the Quaternary aquifer, due to the incorrect construction of wells for irrigation. The discharge is produced by pumping, mainly in the Andalusian and Pliocene aquifers, as well as by outlets to the Mar Menor and the Mediterranean Sea through the Quaternary aquifer. The piezometric levels of deep aquifers are affected by the amount of irrigation surface water provided by the water transfer between the rivers Tajo and Segura, which is reduced in times of drought [15,24,25].

2.2. Origin of the Data

The physicochemical parameters of the groundwater from this region selected for this study were pH, nitrate, chloride, sulfate, electrical conductivity, corresponding to the period from March 2014 to November 2020 (282 samples). Data were provided by the Consejería de Agua, Agricultura, Ganadería, Pesca y Medio Ambiente of Region of Murcia and the Confederación Hidrográfica of Segura River. Groundwater samples were collected following the Standard Method (ISO 5667-11) [26]. Sample analysis was carried out by the Water Quality Laboratory of the Confederación Hidrográfica del Segura. Electric conductivity and pH were determined by electrometric probes following internal methods based on the corresponding Standard Methods (SM 2510-B and SM 4500 H) [27]. Nitrate, chloride, and sulfate were determined by Ion Chromatography following an internal method based on UNE EN-ISO 14911 [28]. The location of the region studied, together with the 47 sampling points considered, are shown in Figure 2, whereas the geographic coordinates of this sampling points, classified in 4 different zones (center, north, south, west) are shown in Table 1.
Table 2 shows the minimum, maximum, mean and standard deviation values, as well as the number of data considered in each case (N), of the physicochemical parameters considered in this study, classified in the 4 different zones mentioned above, including the number of values in each case (total number of values is 282).

2.3. Methodology of Machine Learning Models

The methods selected in this work are based on machine learning: Naïve-Bayes and Decision-tree [30], using the KNIME platform [31], which is characterized by not requiring a great computational power and by its wide availability because it is an open source. Machine learning models are mathematical models used to determine the relationship between their inputs and outputs. The parameters involved in the model are adjusted or estimated from the input data by means of a learning algorithm. A classification model is generally built in two phases: the first one is the training phase, where the model is built itself and its parameters are adjusted using the so-called training; the second one is the testing phase, where the developed model is applied to new data. Finally, the performance of the model is evaluated using the test data. The total number of data was 282, 70% of which was used for the training phase and 30% was used in the testing phase.
Naïve-Bayes models are a family of simple “probabilistic classifiers” based on the application of Bayes’ theorem with strong assumptions of (Naïve) independence between features, i.e., the more independent the variables are from each other, and the more reliable the model is. They are highly scalable, requiring a number of parameters linear to the number of variables (features/predictors) in a learning problem. Maximum likelihood training can be performed by evaluating a closed-form expression, which requires linear time, rather than a costly iterative approach as used by many other types of classifiers.
On the other hand, Decision-trees are classification models that use a tree structure to represent multiple decision paths. The variance has been calculated with the GINI index, without pruning, due the small size of the tree. The route of each path leads to a different way of classifying an input sample.
Both models, Naïve-Bayes and Decision-tree, start from the data entered in the KNIME tool, and after training, they are able to classify the quality of groundwater of the sampling points as high, medium, and low quality, according to European and Spanish regulations.
The accuracy of the models was evaluated through the confusion matrices, which consider that the estimated values are satisfactory when the result of the rows and columns in the matrix itself are coincident, whereas the estimated values are considered as not satisfactory when results in rows and columns are contradictory. Therefore, in this work, the correlation matrix between the variables used was calculated for both models to check which is more accurate in the classification process of the groundwater quality.
In Figure 3, the flow chart of the general process of the machine learning models applied is shown.
This general process of the machine learning is used with the specific models applied in this work. Thus, the output schemes of Naïve-Bayes and Decision-tree models applied, through the KNIME tool used, are shown in Figure 4 and Figure 5, respectively, where it can be observed the implementation of the proposed models by means of the general process flow chart (see Figure 3).

3. Results and Discussion

As previously mentioned, Naïve-Bayes and Decision-tree models are used in this study to classify the groundwater quality of the sampling points in Campo de Cartagena as high, medium, and low quality. The classification chosen is based on the values established by European regulations (Directive (EU) 2020/2184 of 16 December 2020 on the quality of water intended for human consumption) [32]; Spanish regulations (RD 47/2022, of 18 January, on the protection of water against nitrate pollution from agricultural sources) [33]; and the Reference Levels (RL) (calculated on the 90th percentile from historical data between 1964 and 2007 in Campo de Cartagena) of appendix II, part B, of the Groundwater Directive, integrated in RD 1514/2009, of 2 October [34], which establish the Threshold Values (TV) indicative of saline intrusion (corresponding to high levels of chloride, sulfate, and electrical conductivity).
Thus, groundwater will be of good quality if it complies with the most restrictive parameters, which indicates that it is not affected by agricultural contamination and can be fit for human consumption. Unlike poor quality groundwater, which will be clearly affected by nitrate contamination and/or saline intrusion, and therefore will require treatment for either use or discharge. Finally, groundwater will be of medium quality when its nitrate concentration does not exceed the maximum value of 50 mg L−1, nor the TV based on the RL of the zone for the rest of the parameters. Groundwater of medium quality will not require treatment for its discharge, and may be suitable for irrigation, although depending on the type of crop, it may need to be desalinated. As a summary, the values of the parameters considered in these regulations are shown in Table 3.
The correlation between the parameters used (variables) was studied through the correlation matrix between them, which allows to choose the most appropriate model. The results obtained are shown in Table 4.
From the Pearson correlation coefficients obtained, it was concluded that there was a strong correlation between the electrical conductivity and the chloride and sulfate concentrations (p < 0.01), whereas in the case of nitrate concentration, this correlation decreased with all the other variables considered.

3.1. Naïve-Bayes Model

The data used and results provided by the Naïve-Bayes model are summarized in Table 5. In this table, data related to the actual water quality at the sampling points, according to European and Spanish regulations [32,33,34] and randomly chosen by the model to perform the testing phase, are shown in the column called as “actual quality” (divided into three categories, low, medium, and high). The predicted water quality data obtained by the model for the same sampling points are those shown in the column called as “predicted quality”.
The application of the Naïve-Bayes model identified two sampling points (22 data) where groundwater was of high quality, one located in the north and the other in the south. It also rated 10 sampling points as medium quality in all locations, but mostly in the north: center (one sampling point, two data), north (six sampling points, 22 data), west (one sampling point, one datum) and south (two sampling points, two data).

3.2. Decision-Tree Model

The data used and results provided by the Decision-tree model are summarized in Table 6. In this table, the information is shown similarly as it is in Table 5 (actual and predicted groundwater quality).
The application of the Decision-tree model did not identify any sampling point where the groundwater can be characterized as of maximum quality. However, it characterized a higher number of sampling points where the groundwater was of medium quality, 14 sampling points (28 data), which were also preferentially located in the north (seven sampling points, 16 data), whereas four sampling points were located in the south (five data), two in the west (five data), and only one in the center (two data).

3.3. Evaluation of Model Accuracy

The accuracy of both models applied was determined from the confusion matrices. Thus, when the value shown in the row did not match with the value of the column of the matrix itself, the prediction by the model was not satisfactory. On the contrary, when values in rows and columns matched, the prediction by the model was satisfactory. The results of confusion matrices are shown in Table 7.
According to confusion matrix results, the Naïve-Bayes model failed to classify five data, corresponding to the sampling points C1, N3, N9, and S5. Similarly, the Decision-tree model failed to predict five data, corresponding to the sampling points C8, N3, N11, and S5.
Therefore, the confusion matrices showed the same level of accuracy for both models, but given the previous correlation analysis between the variables, which generally affects the accuracy of Bayesian methods, and although we have not observed any influence in this case on the Naïve-Bayes model, the Decision-tree method is the most recommendable for its statistical robustness for future predictive applications.

3.4. Groundwater Quality Results by Machine Learning Methods and Its Likely Relation with the Crops in the Studied Region

As previously mentioned, the region of Campo de Cartagena has one of the most technified and productive irrigation systems in Europe, with an important agri-food industry associated with it. The total contribution to gross domestic product (GDP) of agriculture and the associated agri-food industry exceeds one million euros, generating direct employment for more than 40,000 people [35]. The predominant crops, in order of importance, are open-air horticultural crops (lettuce, melon, artichoke, and broccoli), citrus (lemon, orange, and mandarin), and greenhouse horticultural crops (bell pepper). A forest zone is located in the northern part of the region [36]. The map of crops and uses in the region of Campo de Cartagena is shown in Figure 6, where it can be observed the distribution of crops and land uses in the zone under study.
The extension of the cultivated zones for each of the sampling zones, together with the crop system and crop types, are shown in Table 8. The information shown in this table was obtained from [36].
As it can be observed, in the center and northern zones, the irrigated land is by far the most important crop system, representing 78% and 80%, respectively, of the total extension. In the southern zone, forestry is the most important crop system, followed by the irrigated land (47 and 33%, respectively). However, in the western zone, the most important crop system is the rainfed (53%).
In terms of cultivated zones, both irrigated land and rainfed will worsen groundwater quality, since part of nitrate used in crop fertilization will be incorporated into the groundwater. Most of the irrigated land is intensively cultivated vegetables. This cultivation technique allows for a higher number of annual harvests. For this reason, the amount of fertilizer applied in the zone is higher than in the rest of the sampling zones. Within irrigated crops, vegetables stand out, as they are the ones that contribute the most fertilizer to the soil, since the number of annual harvests is greater than other types of crops. Table 9 shows the irrigated zone corresponding to each crop group per sampling zone. The information shown in this table was obtained from [36].
In Campo de Cartagena, the nitrogen balance (calculated as the difference between the total nitrogen supplied and the needs of the crops) is between 10 and 70 kg/ha/year [37], so there is a surplus that partly passes into the groundwater. Between irrigated land and rainfed, the irrigated land is usually the crop system where groundwater quality may be worse, because the irrigation in these zones is more abundant. In the case of rainfed, most of the water available for crops comes from rainfall, and because the fertilization method usually consists of adding fertilizer to the irrigation water.
Most of the water resources available for irrigation come from the water transfer between the rivers Tajo and Segura, although smaller amounts are also supplied from other sources (surface water from the basin, desalinated water, and reused water). It should be noted that the zone is subject to great irregularity in the availability of its water supplies, with very critical situations due to periods of drought [12]. Along with the agricultural activity, there has been great urban development in the region [13]. All this has contributed to a very intensive use of groundwater in Campo de Cartagena.
The Naïve-Bayes model identified two points in two sampling points where the water had the highest quality (N9 and S10), although it failed in the prediction of sampling point N9, which is of medium quality because it exceeded the TV for sulfates in water for human consumption (250 mg L−1), although it was below the TV for the region. On the other hand, the Decision-tree model did not classify any point with this maximum quality but did not fail in its prediction.
Both models identified a similar number of points where groundwater quality was average (27 and 28 points for the Naïve-Bayes and Decision-tree models, respectively), with a predominance of samples taken at points in the northern zone. Although in this zone the irrigated land is the most important crop system, it must be considered that the Andalusian aquifer predominates [38]. Groundwater in these sampling points qualified as average quality and has a good chemical quality because they did not exceed the maximum admissible nitrate concentration (50 mg L−1). However, the groundwater was not classified as high quality because it presented levels of conductivity, chloride, and sulfate, higher than the indicative values established in [32] although they did not generally exceed the TV indicative of saline intrusion [34]. Another factor to take into account in groundwater quality is the presence of untreated slurry from pig farms on the surface, especially in the south zone of this study [37]. Nitrate may be incorporated into the groundwater through irrigation water surpluses that may end up in the aquifer through the infiltration area.
In the western zone, the models also identified some points with high groundwater quality. In this zone, unlike in the rest of zones, the rainfed is the most important crop system, which could explain this higher groundwater quality because the mass of water available for leaching agrochemicals used for fertilization is lower because it mainly comes from rainfall.
Interestingly, in the southern zone, where the crop system is mainly forestry, groundwater quality was found to be worse than in the rest of the sampling zones. This may be due to the presence of pig farms that provide slurry, and to the abandonment of farmland. In the areas closest to the protected zone of Calblanque, regulations have restricted considerably both the surface area to be cultivated and the intensive cultivation techniques that were applied until some time ago. This has led to the abandonment of land traditionally used for forced cultivation or horticultural crops, and the owners have taken advantage of certain areas to remove the brine from the desalination plants used in the area to use the highly saline waters of the aquifer, which otherwise could not be used for irrigation.
Likewise, both methods identified a large number of points, and therefore, areas where groundwater quality was poor. About 95% of these sampling points showed nitrate contamination (between 51–500 mg L−1) due to agricultural activity, and some samples also showed high salinity levels. As previously mentioned, in addition to scarcity, groundwater in Campo de Cartagena has serious chemical quality problems related, above all, to high salinity and nitrate content from an agricultural origin. Polluting substances (agrochemicals) infiltrate the most superficial aquifer, the Quaternary, through irrigation returns and brine from small desalination plants spread throughout the region, which are uncontrollably discharged into the ground (with subsequent infiltration) or injected directly into wells, causing nitrate contamination and saline intrusion [33]. The internal interconnection between aquifers, such as the deeper aquifers with the Quaternary, both by natural processes and through poorly constructed wells, has led to the overall contamination of groundwater in the studied area.
This contamination affects the Mar Menor with the consequent risk of eutrophication [29], which is aggravated during periods of flooding caused by heavy autumn rains, which leach nitrate accumulated in the soils, which are finally discharged into the lagoon [39]. In view of the vulnerability of Campo de Cartagena and the importance of the groundwater of its aquifers for the economic development of the area, its protection must be the main objective [40].

4. Conclusions

Machine learning methods, such as Naïve-Bayes and Decision-tree models, have proven to be accurate tools to predict the groundwater quality in Campo de Cartagena, which is a vulnerable region located in a semiarid zone in Spain with a high agricultural activity. Thus, groundwater is subject to large amounts of nitrate from agricultural fertilizers and saline intrusion due to inadequate desalination treatment. Therefore, these models may be useful in further management of these waters. Proper water management usually requires extensive studies that provide multiple data, sometimes complex to interpret. Artificial intelligence models can facilitate these types of studies by quickly and accurately managing and interpreting a large amount of data, and this has been proven with the developed methods in this work.
Naïve-Bayes and Decision-tree models were designed using the open-source software, and therefore a freely available tool, KNIME, which does not require much computing power to obtain accurate results. Although confusion matrices yielded the same number of failures for both methods, in this case, due to the strong dependence between some of the variables used in the model design, Decision-tree has resulted to be more accurate than Naïve-Bayes.
Groundwater quality was higher in the northern and western zones. Developed models detected more points with high groundwater quality in the northern zone, maybe due to the existence of the Andalusian aquifer in this area, which is deeper and for this reason, can have higher quality groundwater. In the case of western zones, rainfed is the most important crop system, so the amount of water available for leaching fertilizers is lower than in irrigated areas, because it comes mainly from rainfall. In the southern zone, although the percentage of rainfed land is higher than irrigated land, water quality is lower. This may be due to the presence of abundant pig farms that do not adequately discard the slurry produced. In some cases, the leachates from the slurry tanks infiltrate the soil and reach the groundwater, and in others, they are even used directly as fertilizer on farms in the area. In addition, another cause of lower groundwater quality may be the presence of brine from desalination plants used to treat irrigation water from the aquifer in abandoned farming areas.

Author Contributions

Conceptualization, L.F.M., M.I.M.-L., E.M.G.-d.-T., M.Á.Q. and S.G.-S.; methodology, M.I.M.-L., E.M.G.-d.-T., M.Á.Q. and S.G.-S.; software, L.F.M.; validation, M.Á.Q., S.G.-S. and M.I.M.-L.; formal analysis, L.F.M. and E.M.G.-d.-T.; investigation, L.F.M., M.I.M.-L., E.M.G.-d.-T., M.Á.Q. and S.G.-S.; writing—original draft preparation, E.M.G.-d.-T.; writing—review and editing, M.Á.Q., M.I.M.-L. and S.G.-S.; supervision, M.Á.Q., S.G.-S. and M.I.M.-L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors thank Alejandro Buenadicha García, student of the Degree in Civil Engineering of Universidad Politécnica de Madrid, for his collaboration in obtaining the experimental data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wada, Y.; Van Beek, L.P.H.; Van Kempen, C.M.; Reckman, J.W.T.M.; Vasak, S.; Bierkens, M.F.P. Global depletion of groundwater resources. Geophys. Res. Lett. 2010, 37, 1–5. [Google Scholar] [CrossRef] [Green Version]
  2. Perez, M.; Tujchneider, O.; Paris, M.; D’Elía, M. Sustainability indicators of groundwater resources in the Central Area of Santa Fe Province, Argentina. Environ. Earth Sci. 2015, 73, 2671–2682. [Google Scholar] [CrossRef]
  3. Poeter, E.; Fan, Y.; Cherry, J.; Wood, W.M.D. The below-ground portion of our water cycle. In Groundwater in Our Water Cycle, 1st ed.; Cherry, J., Moran, S., de Oliveira, E.P.E., Eds.; Elsevier: Milton, ON, Canada, 2020; ISBN 978-1-7770541-1-3. [Google Scholar]
  4. Leduc, C.; Pulido-Bosch, A.; Remini, B. Anthropization of groundwater resources in the Mediterranean Region: Processes and challenges. Hydrogeol. J. 2017, 25, 1529–1547. [Google Scholar] [CrossRef]
  5. Kinzelbach, W.; Bauer, P.; Siegfried, T.; Brunner, P. Sustainable groundwater management—Problems and scientific tools. Episodes 2003, 26, 279–284. [Google Scholar] [CrossRef] [PubMed]
  6. Bhatnagar, A.; Sillanpää, M. A Review of emerging adsorbents for nitrate removal from water. Chem. Eng. J. 2011, 168, 493–504. [Google Scholar] [CrossRef]
  7. Abascal, E.; Gómez-Coma, L.; Ortiz, I.; Ortiz, A. Global diagnosis of nitrate pollution in groundwater and review of removal technologies. Sci. Total Environ. 2022, 810, 152233. [Google Scholar] [CrossRef]
  8. Herschy, R. Water quality for drinking: WHO Guidelines. In Encyclopedia of Lakes and Reservoirs; Encyclopedia of Earth Sciences Series; Bengtsson, L., Herschy, R.W., Fairbridge, R.W., Eds.; World Health Organization: Geneva, Switzerland, 2012; pp. 876–883. ISBN 978-92-4-154995-0. [Google Scholar]
  9. Heaton, T.H.E.; Stuart, M.E.; Sapiano, M.; Micallef Sultana, M. An isotope study of the sources of Nitrate in Malta’s Groundwater. J. Hydrol. 2012, 414–415, 244–254. [Google Scholar] [CrossRef]
  10. Re, V.; Sacchi, E.; Kammoun, S.; Tringali, C.; Trabelsi, R.; Zouari, K.; Daniele, S. Integrated socio-hydrogeological approach to tackle nitrate contamination in groundwater resources. The Case of Grombalia Basin (Tunisia). Sci. Total Environ. 2017, 593–594, 664–676. [Google Scholar] [CrossRef]
  11. Kapembo, M.L.; Laffite, A.; Bokolo, M.K.; Mbanga, A.L.; Maya-Vangua, M.M.; Otamonga, J.P.; Mulaji, C.K.; Mpiana, P.T.; Wildi, W.; Poté, J. Evaluation of water quality from suburban shallow wells under tropical conditions according to the seasonal variation, bumbu, kinshasa, democratic republic of the congo. Exp. Health 2016, 8, 487–496. [Google Scholar] [CrossRef] [Green Version]
  12. Comunidad de Regantes Campo de Cartagena. Available online: https://www.crcc.es/informacion-general/informacion-c-r-c-c/ (accessed on 9 January 2022).
  13. Martínez Fernández, J.; Fitz, C.; Esteve-Selma, M.Á.; Guaita, N.; Martinez-Lopez, J. Modelización del efecto de los cambios de uso del suelo sobre los flujos de nutrientes En Cuencas Agrícolas Costeras: El Caso Del Mar Menor (Sudeste de España). Ecosistemas 2013, 22, 84–94. [Google Scholar] [CrossRef]
  14. Pedrero Salcedo, F.; Pérez Cutillas, P.; Aziz, F.; Llobet Escabias, M.; Boesveld, H.; Bartholomeus, H.; Tallou, A. Soil salinity prediction using remotely piloted aircraft systems under semi-arid environments irrigated with salty non-conventional water resources. Agronomy 2022, 12, 2022. [Google Scholar] [CrossRef]
  15. Alcolea, A.; Contreras, S.; Hunink, J.E.; García-Aróstegui, J.L.; Jiménez-Martínez, J. Hydrogeological modelling for the watershed management of the Mar Menor Coastal Lagoon (Spain). Sci. Total Environ. 2019, 663, 901–914. [Google Scholar] [CrossRef] [PubMed]
  16. UN Sustainable Development Goals. Available online: https://www.un.org/sustainabledevelopment/development-agenda/ (accessed on 13 March 2022).
  17. Armenta-Medina, D.; Ramirez-Delreal, T.A.; Villanueva-Vásquez, D.; Mejia-Aguirre, C. Trends on advanced information and communication technologies for improving agricultural productivities: A bibliometric analysis. Agronomy 2020, 10, 1989. [Google Scholar] [CrossRef]
  18. Gholami, V.; Booij, M.J. Use of machine learning and geographical information system to predict nitrate concentration in an unconfined aquifer in Iran. J. Clean. Prod. 2022, 360, 131847. [Google Scholar] [CrossRef]
  19. Awais, M.; Aslam, B.; Maqsoom, A.; Khalil, U.; Ullah, F.; Azam, S.; Imran, M. Assessing nitrate contamination risks in groundwater: A machine learning approach. Appl. Sci. 2021, 11, 10034. [Google Scholar] [CrossRef]
  20. El Bilali, A.; Taleb, A.; Brouziyne, Y. Groundwater quality forecasting using machine learning algorithms for irrigation purposes. Agric. Water Manag. 2021, 245, 106625. [Google Scholar] [CrossRef]
  21. Alkindi, K.M.; Mukherjee, K.; Pandey, M.; Arora, A.; Janizadeh, S.; Pham, Q.B.; Anh, D.T.; Ahmadi, K. Prediction of groundwater nitrate concentration in a Semiarid region using hybrid bayesian artificial intelligence approaches. Environ. Sci. Pollut. Res. 2022, 29, 20421–20436. [Google Scholar] [CrossRef]
  22. Chazarra Bernabé, A.; Flórez García, E.; Peraza Sánchez, B.; Tohá Rebull, T.; Lorenzo Mariño, B.; Criado Pinto, E.; Moreno García, J.V.; Romero Fresneda, R.; Botey Fullat, R. Mapas Climáticos de España (1981–2010) y ETo (1996–2016); Agencia Estatal de Meteorología (AEMET), Ministerio para la Transición Ecológica y el Reto Demográfico: Madrid, Spain, 2018; p. 75.
  23. Agencia Estatal de Meteorologia (AEMET). Ministerio para la Transición Ecológica y el Reto Demográfico. Valores Climatológicos Normales. San Javier Aeropuerto. Available online: https://www.aemet.es/es/serviciosclimaticos/datosclimatologicos/valoresclimatologicos?l=7031&k=undefined (accessed on 22 November 2022).
  24. Instituto Tecnológico y Geominero de España (ITGE). Las Aguas Subterráneas del Campo de Cartagena (Murcia) Madrid; ITGE: Madrid, Spain, 1993. [Google Scholar]
  25. Jiménez-Martínez, J.; Aravena, R.; Candela, L. The role of leaky boreholes in the contamination of a regional confined aquifer. A case study: The Campo de Cartagena Region, Spain. Water Air Soil Pollut. 2011, 215, 311–327. [Google Scholar] [CrossRef]
  26. ISO 5667-11:2009; Water Quality Sampling Part 11: Guidance on Sampling of Groundwaters. International Organization for Standardization: Geneva, Switzerland, 2009.
  27. Rodger, B.; Bridgewater, L. Standard Methods for the Examination of Water and Wastewater; American Public Health Association: Washington, DC, USA, 2017. [Google Scholar]
  28. ISO 14911:1998; Water Quality. Determination Dissolved of Li+, Na+, NH4+, K+, Mn2+, Ca2+, Mg2+, Sr2+ and Ba2+ Using Ion ChromatographyMethod for Water and Waste Water. International Organization for Standardization: Geneva, Switzerland, 2000.
  29. Buenadicha García, A. La Degradación Del Mar Menor: Causas y Planteamiento de Posible Solución Relacionada Con La Ingeniería Civil. Bachelor’s Thesis, Escuela Técnica Superior de Ingeniería Civil, Madrid, Spain, 2021. [Google Scholar]
  30. Karim, M.; Rahman, R.M. Decision tree and naïve bayes algorithm for classification and generation of actionable knowledge for direct marketing. J. Softw. Eng. Appl. 2013, 6, 196–206. [Google Scholar] [CrossRef] [Green Version]
  31. Platform, K.A. KoNstanz Information MinEr. Available online: https://www.knime.com/software-overview (accessed on 5 February 2022).
  32. European Union Directive (EU) 2020/2184 of the European Parliament and of the Council of 16 December 2020 on the quality of water intended for human consumption (recast). Off. J. Eur. Union 2020, 435, 12–23.
  33. BOE. Real Decreto 47/2022, de 18 de Enero, Sobre Protección de las Aguas Contra la Contaminación Difusa Producida Por Los Nitratos Procedentes de Fuentes Agrarias. Available online: https://www.boe.es/boe/dias/2022/01/20/pdfs/BOE-A-2022-860.pdf (accessed on 26 October 2022).
  34. BOE. Real Decreto 1514/2009, de 2 de Octubre, Por el Que Se Regula la Protección de las Aguas Subterráneas Contra la Contaminación y el Deterioro. Available online: https://www.boe.es/buscar/pdf/2009/BOE-A-2009-16772-consolidado.pdf (accessed on 26 October 2022).
  35. Pérez Hernándes, F.; Martínez Vicente, D.; Carmona Cabrera, A.; Barba Martínez, E.; Portillo Muñoz, J.M.; Mora Rufete, I.; Corbalán Pellicer, J.; Campillo Moreno, F. Estadística Agraria de Murcia 2017/18; Informe 26; Dirección General de Innovación, Producciones y Mercados Agroalimentarios. Consejería de Agua, Agricultura, Ganadería y Pesca. Comunidad Autónoma de la Región de Murcia: Murcia, Spain, 2018; pp. 61–90.
  36. Ministerio de Agricultura, Pesca y Alimentación. Mapa de Cultivos y Aprovechamientos de España. Available online: https://www.mapa.gob.es/es/agricultura/temas/sistema-de-informacion-geografica-de-datos-agrarios/mca.aspx (accessed on 9 June 2022).
  37. Ministerio para la Transición Ecológica y el Reto Demográfico. Análisis de Soluciones para el Vertido Cero al Mar Menor Proveniente del Campo de Cartagena. Estudio del Impacto Ambiental. APÉNDICE 1 Diagnóstico de la Problemática del Mar Menor; Ministerio para la Transición Ecológica y el Reto Demográfico: Madrid, Spain, 2019; pp. 1–23.
  38. Domingo-Pinillos, J.C.; Senent-Aparicio, J.; García-Aróstegui, J.L.; Baudron, P. Long term hydrodynamic effects in a semi-arid mediterranean multilayer aquifer: Campo de cartagena in South-Eastern Spain. Water 2018, 10, 1320. [Google Scholar] [CrossRef]
  39. Velasco, J.; Lloret, J.; Millan, A.; Marin, A.; Barahona, J.; Abellan, P.; Sanchez-Fernandez, D. Nutrient and particulate inputs into the Mar Menor Lagoon (SE Spain) from an intensive agricultural watershed. Water Air Soil Pollut. 2006, 176, 37–56. [Google Scholar] [CrossRef]
  40. Ministerio para la Transición Ecológica y el Reto Demográfico. Estado de Situación de las Actuaciones Previstas por la Administración General del Estado para Abordar la Situación del Mar Menor; Ministerio para la Transición Ecológica y el Reto Demográfico: Madrid, Spain, 2020.
Figure 1. Climograph of Campo de Cartagena (Murcia, Spain). Information from 1981–2010 (self-made figure with data from AEMET [23]).
Figure 1. Climograph of Campo de Cartagena (Murcia, Spain). Information from 1981–2010 (self-made figure with data from AEMET [23]).
Agronomy 12 03076 g001
Figure 2. Map of the studied zone showing groundwater sampling points (north zone: N, orange color; south zone: S, green color; center zone: C, blue color; west zone: O, red color) [29].
Figure 2. Map of the studied zone showing groundwater sampling points (north zone: N, orange color; south zone: S, green color; center zone: C, blue color; west zone: O, red color) [29].
Agronomy 12 03076 g002
Figure 3. General process flow chart used in Naïve-Bayes and Decision-tree models.
Figure 3. General process flow chart used in Naïve-Bayes and Decision-tree models.
Agronomy 12 03076 g003
Figure 4. KNIME tool output diagram for the Naïve-Bayes model.
Figure 4. KNIME tool output diagram for the Naïve-Bayes model.
Agronomy 12 03076 g004
Figure 5. KNIME tool output diagram for the Decision-tree model.
Figure 5. KNIME tool output diagram for the Decision-tree model.
Agronomy 12 03076 g005
Figure 6. Map of crops and soil uses of Campo de Cartagena (adapted from [36]).
Figure 6. Map of crops and soil uses of Campo de Cartagena (adapted from [36]).
Agronomy 12 03076 g006
Table 1. Geographic coordinates of the 47 sampling points studied in Campo de Cartagena.
Table 1. Geographic coordinates of the 47 sampling points studied in Campo de Cartagena.
ZoneSampling PointsGeographic Coordinates (X-Y UTM)Sampling PointsGeographic Coordinates (X-Y UTM)
CenterC1(678652-4180779)C6(684489-4176462)
C2(678651-4178378)C7(685400-4177655)
C3(681013-4178572)C8(685886-4176189)
C4(682822-4177692C9(681013-4173533)
C5(682888-4177302)C10(689915-4173913)
SouthS1(678134-4169109)S9(679034-4164293)
S2(680169-4169009)S10(679840-4163229)
S3(681177-4167963)S11(680573-4164666)
S4(687788-4167392)S12(694476-4164593)
S5(680826-4168169)S13(694575-4164931)
S6(692485-4168946)S14(695333-4165861)
S7(676797-4164479)S15(700773-4166663)
S8(676986-4163637)
NorthN1(680664-4198075)N10(692809-4193259)
N2(680703-4198219)N11(673688-4189992)
N3(680826-4197634)N12(672248-4189785)
N4(675789-4195142)N13(672188-4189692)
N5(671637-4192846)N14(671820-4188127)
N6(679973-4191938)N15(679302-4185182)
N7(691092-4193273)N16(688113-4185836)
N8(691695-4192247)N17(691717-4187562)
N9(692703-4193059)N18(695504-4190549)
WestO1(650134-4177265)O3(657176-4177458)
O2(652084-4178079)O4(658556-4175594)
Table 2. Physicochemical parameters of groundwater in Campo de Cartagena, corresponding to the 2014/2020 period.
Table 2. Physicochemical parameters of groundwater in Campo de Cartagena, corresponding to the 2014/2020 period.
ZoneParameterUnitMinimum ValueMaximum ValueMeanStandard DeviationN
CenterpH 7.08.527.50.478
Nitratemg L−18.2932112 × 1017 × 10178
Chloridemg L−1135357617 × 1028 × 10278
Sulfatemg L−1186283215 × 1025 × 10278
Electrical conductivityμS cm−1140911,0007 × 1032 × 10378
NorthpH 6.309.007.50.5116
Nitratemg L−1n.d.4171 × 1021 × 102116
Chloridemg L−145401812 × 1021 × 102116
Sulfatemg L−1482831283 × 1015 × 101116
Electrical conductivityμS cm−1121012,3505 × 1032 × 103116
SouthpH 6.748.407.40.367
Nitratemg L−1n.d.5222 × 1021 × 10267
Chloridemg L−1n.d.255911 × 1025 × 10267
Sulfatemg L−13644091 × 1031 × 10367
Electrical conductivityμS cm−169611,0605 × 1032 × 10367
WestpH 7.108.107.50.221
Nitratemg L−181426 × 1012 × 10121
Chloridemg L−1708.0012809 × 1021 × 10221
Sulfatemg L−1930180712 × 1022 × 10221
Electrical conductivityμS cm−13530580446 × 1036 × 10321
n.d.: not detected.
Table 3. Classification criteria of groundwater quality according European and Spanish regulations [32,33,34].
Table 3. Classification criteria of groundwater quality according European and Spanish regulations [32,33,34].
ParameterGroundwater Quality
HighMediumLow
Nitrate (mg L−1)<37.537.5–50>50
Electrical conductivity (μS cm−1)<25002500–6500>6500
Chloride (mg L−1)<250250–1500>1500
Sulfate (mg L−1)<250250–1500>1500
Table 4. Pearson correlation coefficient (r) and p-value between variables (N = 282).
Table 4. Pearson correlation coefficient (r) and p-value between variables (N = 282).
VariableStatisticElectrical ConductivityChlorideSulfateNitrate
Electrical conductivityr10.8870.7060.175
p-value 0.0000.0000.009
Chlorider0.88710.6560.161
p-value0.000 0.0000.016
Sulfater0.7060.6561−0.004
p-value0.0000.000 0.947
Nitrater0.1750.161−0.0041
p-value0.0090.0160.947
Table 5. Data used and results provided by the Naïve-Bayes model (N = 85).
Table 5. Data used and results provided by the Naïve-Bayes model (N = 85).
Actual QualityPredicted Quality Actual QualityPredicted Quality
Zone
SP 1
LowMediumHighLowMediumHighZone
SP 1
LowMediumHighLowMediumHigh
CenterCenter
C14 22 C65 5
C25 5 C82 2
C33 3
NorthNorth
N12 2 N124 4
N311 2 N1314 14
N7 3 3 N141 1
N8 1 1 N15 5 5
N9 8 71N172 2
N113 3 N185 5
WestWest
O12 2 O3 1 1
SouthSouth
S11 1 S82 2
S26 6 S92 2
S35 5 S10 1 1
S51 1 S132 2
S61 1 S15 1 1
Total Actual QualityLow60Medium24High1
Total Predicted QualityLow56Medium27High2
1 SP: Sampling points.
Table 6. Data used and results provided by the Decision-tree model (N = 85).
Table 6. Data used and results provided by the Decision-tree model (N = 85).
Actual QualityPredicted Quality Actual QualityPredicted Quality
Zone
SP 1
LowMediumHighLowMediumHighZone
SP 1
LowMediumHighLowMediumHigh
CenterCenter
C16 6 C64 4
C21 1 C71 1
C33 3 C831132
C44 4
NorthNorth
N13 3 N121 1
N321 3 N1312 12
N6 1 1 N141 1
N7 2 2 N15 3 3
N9 2 2 N174 4
N1152 43 N181 1
WestWest
O162 62 O3 3 3
SouthSouth
S12 2 S8 1 1
S24 4 S92 2
S31 1 S12 1 1
S51 1 S132 2
S63 3 S15 2 2
Total Actual Quality Low61Medium23High1
Total Predicted QualityLow57Medium28High0
1 SP: Sampling points.
Table 7. Results of the confusion matrices obtained for the Naïve-Bayes and Decision-tree models.
Table 7. Results of the confusion matrices obtained for the Naïve-Bayes and Decision-tree models.
ModelClassification of Groundwater Quality
LowMediumHigh
Naïve-BayesLow5640
Medium0231
High001
Decision-treeLow5740
Medium0230
High010
Table 8. Cultivated zones by sampling zones.
Table 8. Cultivated zones by sampling zones.
ZoneCrop SystemArea (Ha)
CenterIrrigated land 15154.70
Forestry 21253.11
Rainfed 3224.42
SouthIrrigated land 118,957.57
Forestry 226,847.15
Rainfed 311,661.44
NorthIrrigated land 118,302.80
Forestry 23815.86
Rainfed 3747.84
WestIrrigated land 16178.58
Forestry 26695.73
Rainfed 314,318.43
1 Crop type in irrigated land: herbaceous crops, cereals for grain, industrial crops, flowers, vegetables, human consumption tubers, citrus fruits, almond tree, peach tree, olive grove, vineyard, forced crops (greenhouses, mesh cultivation, padded crops), others. 2 Crop type in forestry: conifers, scrub and pasture, and unproductive. 3 Crop type in rainfed: rainfed farming, olive grove, fruit trees.
Table 9. Irrigated crop zone by sampling zones.
Table 9. Irrigated crop zone by sampling zones.
CropsSampling Zone
CenterSouthNorthWest
Area (Ha)
Herbaceous443095873496.52270
Cereals for grain01040
Industrial crops9.75717.250
Flowers7.51834.50
Vegetables and forced crops4195597819922228
Human consumption tubers218.253510156.7542
Citrus fruits170326572369938
Almond tree112.532842.5393
Peach tree1.5852.50
Olive grove914862149
Vineyard020120
Others33.752470.2524
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

García-del-Toro, E.M.; García-Salgado, S.; Mateo, L.F.; Quijano, M.Á.; Más-López, M.I. Machine Learning as a Diagnosis Tool of Groundwater Quality in Zones with High Agricultural Activity (Region of Campo de Cartagena, Murcia, Spain). Agronomy 2022, 12, 3076. https://doi.org/10.3390/agronomy12123076

AMA Style

García-del-Toro EM, García-Salgado S, Mateo LF, Quijano MÁ, Más-López MI. Machine Learning as a Diagnosis Tool of Groundwater Quality in Zones with High Agricultural Activity (Region of Campo de Cartagena, Murcia, Spain). Agronomy. 2022; 12(12):3076. https://doi.org/10.3390/agronomy12123076

Chicago/Turabian Style

García-del-Toro, Eva M., Sara García-Salgado, Luis F. Mateo, M. Ángeles Quijano, and M. Isabel Más-López. 2022. "Machine Learning as a Diagnosis Tool of Groundwater Quality in Zones with High Agricultural Activity (Region of Campo de Cartagena, Murcia, Spain)" Agronomy 12, no. 12: 3076. https://doi.org/10.3390/agronomy12123076

APA Style

García-del-Toro, E. M., García-Salgado, S., Mateo, L. F., Quijano, M. Á., & Más-López, M. I. (2022). Machine Learning as a Diagnosis Tool of Groundwater Quality in Zones with High Agricultural Activity (Region of Campo de Cartagena, Murcia, Spain). Agronomy, 12(12), 3076. https://doi.org/10.3390/agronomy12123076

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop