Application of a Novel Hybrid Wavelet-ANFIS/Fuzzy C-Means Clustering Model to Predict Groundwater Fluctuations

Mohammad Mahdi Jafari; Hassan Ojaghlou; Mohammad Zare; Guy Jean-Pierre Schumann

doi:10.3390/atmos12010009

,

and

¹

Department of Water Science and Engineering, University of Zanjan, Zanjan 45371-38791, Iran

²

Research and Education Department (RED), RSS-Hydro, 3593 Dudelange, Luxembourg

³

Institute of Civil and Environmental Engineering (INCEEN), Faculty of Science, Technology and Communication (FSTC), University of Luxembourg, 4365 Esch-sur-Alzette, Luxembourg

⁴

School of Geographical Sciences, University of Bristol, Bristol BS8 1TL, UK

Atmosphere2021, 12(1), 9;https://doi.org/10.3390/atmos12010009

This article belongs to the Special Issue Artificial Intelligence and Machine Learning: Application in Predictive Hydrological Models

Version Notes

Order Reprints

Abstract

In order to optimize the management of groundwater resources, accurate estimates of groundwater level (GWL) fluctuations are required. In recent years, the use of artificial intelligence methods based on data mining theory has increasingly attracted attention. The goal of this research is to evaluate and compare the performance of adaptive network-based fuzzy inference system (ANFIS) and Wavelet-ANFIS models based on FCM for simulation/prediction of monthly GWL in the Maragheh plain in northwestern Iran. A 22-year dataset (1996–2018) including hydrological parameters such as monthly precipitation (P) and GWL from 25 observation wells was used as models input data. To improve the prediction accuracy of hybrid Wavelet-ANFIS model, different mother wavelets and different numbers of clusters and decomposition levels were investigated. The new hybrid model with Sym4-mother wavelet, two clusters and a decomposition level equal to 3 showed the best performance. The maximum values of R2 in the training and testing phases were 0.997 and 0.994, respectively, and the best RMSE values were 0.05 and 0.08 m, respectively. By comparing the results of the ANFIS and hybrid Wavelet-ANFIS models, it can be deduced that a hybrid model is an acceptable method in modeling of GWL because it employs both the wavelet transform and FCM clustering technique.

Keywords:

artificial intelligence; groundwater level; Wavelet-ANFIS; decomposition level

1. Introduction

Hydrological changes, also linked to global climate change, will have devastating effects on surface water and groundwater resources in many areas of the world [1]. Many areas are already facing water shortages such as the Middle East, including Iran. Iran has arid and semi-arid climates with limited surface water resources, so the main water resource is groundwater in most regions [2,3]. Recent droughts, coupled with irregular groundwater withdrawal through deep and semi-deep wells, have caused a severe drop in groundwater levels (GWL) and consequently water quality degradation, soil erosion, drying of wells, reducing river discharge, increasing cost of pumping, and desertification [4]. In this regard, an effective water resources management is vital. The monitoring, simulation, and prediction of GWL help water planners apply appropriate measures for water resources management. This can be done by a field measurement of GWL using borehole data which is, however, expensive and time consuming. In recent years, the use of indirect methods such as artificial intelligence (AI) and machine learning (ML) models has been developed to analyze the trends of hydrological parameters. The latter have been used in many hydro(geo)logical studies, specifically in GWL predictions [5,6,7,8,9,10,11]. Finding suitable models that can simulate changes in GL oscillation is an ongoing challenge for hydrogeologists. This is because GWL changes are affected by hydrological variables such as precipitation, evaporation, and recharge through the unsaturated zone, so they do not depend solely on groundwater pumping. All these processes are non-linear, random, and complex [12]. One of several ML approaches is Adaptive Neuro-Fuzzy Inference System (ANFIS). ANFIS are a class of adaptive networks that are functionally equivalent to fuzzy inference system. They are a combination of neural networks (or network-based approach) and fuzzy logic which makes them an attractive processing tool. This method comprises the benefits and capabilities of neural networks and fuzzy logic at the same time [13]. Some research studies were carried out using adaptive ANFIS in groundwater modeling [14,15,16,17,18,19,20] where the GWL changes were simulated using ANN and ANFIS models in the Miandarband plain of Iran [21], and the ANFIS model was introduced as an adequate method [21]. Although ANFIS is one of the most suitable data driven models for GWL fluctuations simulation, it uses a traditional grid partitioning (GP) method to find the minimum square error in simulation, i.e., the GP method divides the data space into rectangular sub spaces called “grids” based on the number of membership functions and their types, ergo, a high-dimensional fuzzy grid will be generated [22]. This large rule-base in the fuzzy inference system (FIS) must optimize its parameters, which may cause a dimensionality problem. To solve the problem, a powerful unsupervised clustering algorithm called fuzzy c-means clustering (FCM) model has been considered in recent years. FCM organizes the data into groups based on similarities that helps ANFIS generate less rules thereby reducing data dimensionality, and consequently less processing time is required compared to the GP-method [23]. Artificial intelligence models are not always applicable to hydrological studies because time series in nature are not stationary most of the time, thus de noising is a fundamental issue in time series analysis because noise has a considerable impact on the actual characteristics of time series [24,25]. Time series observed in nature usually show the characteristics of an unstable and multi-time scale. However, traditional methods of de noising are currently largely based on model simulation or spectral analysis, and they cannot demonstrate these complex series characteristics and therefore cannot meet practical needs [26,27]. In comparison, the wavelet threshold de-noising (WTD) method is more effective and can be used in various engineering applications since it can clarify the local characteristics of non-fixed time series in both time and frequency ranges [28,29]. Although theoretically powerful, in practice the WTD method is affected by four fundamental but key issues: namely the choice of wavelet, the choice of decomposition level (DL), the threshold estimation, and the choice of thresholding rules. Wavelet signal processing approach could be coupled with FCM-ANFIS model to overcome the non-stationary time series problem [30,31,32,33,34]. Wavelet-ANFIS approaches have been widely used in water resources management and hydrological projects including GWL simulation [35,36,37]. The GWL predictions were carried out by using the hybrid Wavelet-ANFIS model in the Mashhad plain for different time intervals and were validated by borehole observations [38]. For the prediction of GWL changes in the Miandarband floodplain in western Iran, comparative the Wavelet-ANFIS and ANFIS models based on the FCM were evaluated by [11]. Comparing the output of models indicated that the hybrid Wavelet-ANFIS model is more accurate. To our best knowledge, the issue of choice of decomposition level in WTD is less discussed in previous studies, and no effective and practical approach is presently available [39]. Among the methods for the choice of decomposition level, the following are worth mentioning: wavelet threshold de-noising, autocorrelations, and energy distributions of noises, although the latter method is mostly used in studies related to water engineering. The main purpose of the present study is to evaluate and compare the performance of ANFIS, and Wavelet-ANFIS models based on FCM for the simulation and prediction of monthly GWL in some parts of the Maragheh plain in northwestern Iran. In line with the main goal of this study, we attempt to investigate the effect of different decomposition levels on accuracy of the Wavelet-ANFIS model and determine the optimal level.

2. Materials and Methods

2.1. Study Area

The study area is a part of the Maragheh plain, located in northwestern Iran, between latitudes 37°11′–37°28′ and longitudes 46°–46°11′. This region is geographically limited in the East by the Lake Urmia and has a surface area of about 414 km². The long-term annual rainfall and temperature are 294 mm and 14 °C, respectively. The climate of the region is arid and cold (see Figure 1).

Figure 1. The Maragheh plain in northwestern Iran.

2.2. Data Description

In order to simulate and predict the GWL fluctuations in the Maragheh plain, the monthly averages of rainfall and GWL data were collected from the State Government. The observed precipitation and GWL (Table 1) are available for the years 1996 to 2018 (261 months). The locations of these wells with their Thiessen polygons are shown in Figure 2.

Table 1. List of the 25 wells and the areas of their Theissen polygons.

Figure 2. Borehole locations with Thiessen polygons.

2.3. Completion of Groundwater Level Missing Values

Among 25 observation wells, only 18 wells had complete time series of raw GWL-data during the period 1996 to 2018. The missing values of monthly GWL for seven wells (see Table 2) were estimated by inverse distance weighting (IDW) interpolation. The credibility of the estimated values was checked by comparison with previous month values. Subsequently, the GWL-data were average-weighted using the (25) Thiessen polygon zones to create weight-hydrographs of the GWL-fluctuations (Figure 3).

Table 2. Number of missing values for wells with missing data.

Figure 3. Average groundwater level (GWL) fluctuations in the Maragheh plain during 1996–2018.

2.4. Training and Testing of GWL Prediction Models

The ANFIS and Wavelet-ANFIS models were trained based on monthly-observed GWL as well as precipitation (P) data time series. The following input-output prediction model for GWL at time t as a function of GWL_t₋₁ and P_t_−j was applied:

G W L_{t} = f (G W L_{t - i}, P_{t - j}) f o r i = 1, 2, \dots, n & j = 0, 1, 2, \dots, n

(1)

i.e., it is assumed that the groundwater levels in month t (GWLt) are correlated with up to t-i previous months ‘groundwater levels and precipitation P_t_−j. Although Equation (1) is basically just a simple transfer model, it has some physical basis, as groundwater data usually exhibit some persistence, as illustrated in Figure 3, given that groundwater is recharged by rainfall with a delay [40,41].

Autocorrelation analysis of the GWL-time series and cross correlation analysis of the GWL- with the precipitation time series P were performed, respectively. The continuous autocorrelation function Rxx of a time series x is defined as [41]:

R_{x x} = \lim_{T \to \infty} \frac{1}{2 T} \int_{- T}^{T} x (t) x (t + l) d t

(2)

where T is the period of observation and l is the lag time. For discrete data, Equation (2) is replaced by its discrete homologue which for x = GWL, reads

R_{G W L, G W L} (m) = \frac{1}{N} \sum_{n = 1}^{N - 1} G W L (n) G W L (- l)

(3)

where n (=264) is the number of observed months. The cross-correlation function Rxy between two time series x and y is defined similarly to Equations (2) and (3), so that the discrete cross correlation between GWL and P at lag l reads as:

R_{G W L, P} (m) = \frac{1}{N} \sum_{n = 1}^{N - 1} G W L (n) P (n - l)

(4)

Figure 4 shows that GWL_t is mostly correlated with GWL_{t−1,t−2,t−3} and P_{t−1,t−2,t−3,t−4}. Therefore, the final input-output Wavelet-ANFIS and ANFIS and SVM model Equation (1) can be written as:

G W L_{t} = f (G W L_{t - 1}, G W L_{t - 2}, G W L_{t - 3}, P_{t - 1}, P_{t - 2}, P_{t - 3} P_{t - 4})

(5)

Figure 4. Autocorrelation function for GWL (a) and cross correlation function between GL and Precipitation (b).

For the training and testing of the models the cross-validation method has been used, i.e., the 261-values observed average GWL- and P- data series are divided into a training set of 183 months and testing set of 78 months for prediction with the trained model. Thus, the length of the training data set is 70% and the test (prediction) data makes up the remaining 30%.

2.5. Adaptive Neuro Fuzzy Inference System (ANFIS)

The adaptive neuro fuzzy inference system (ANFIS), first introduced by [13], is a combination of an adaptive neural network and a fuzzy inference system. This model has also an adequate in training, producing and categorizing and has several advantages over classical ANN, such as the capability of handling large amounts of data, dynamic and nonlinear systems modeling, easy use, and efficient computing, while still exhibiting high estimation and prediction accuracies [42]. Combining at the same time the benefits and capabilities of neural network architecture methods and fuzzy logic, ANFIS uses a hybrid approach of the classical gradient descent procedure and systematic back-propagation and thus avoids the “trap” of the error function in a local minimum, typical of the former optimization method [43]. The adaptive network based on the Sugeno fuzzy inference model provides a deterministic system of output equations and is thus a useful approach for parameter estimation [44]. The ANFIS model consists of five layers and is given in Figure 5 [13]. The first layer is called input layer; O_1,i is the output of the ith node of the layer 1.

O_{1, i} = μ_{A_{i}} (x_{1}), O_{1, i + 2} = μ_{B_{i}} (x_{2}) f o r i = 1, 2

(6)

where X_1,2 is the input node i and A_i (or B_i) is a linguistic label associated with this node. Therefore O_1,i is the membership grade of a fuzzy set (A₁, A₂, B₁, B₂). The second layer consists of rule nodes with AND and/or OR operators. The output (O_2,i) is the product of all the incoming signals.

O_{2, i} = w_{i} = μ_{A_{i}} (x_{1}) \times μ_{B_{i}} (x_{2}), f o r i = 1, 2

(7)

Figure 5. Architecture of ANFIS [11].

The third layer outputs (O_3,i) are called normalized firing strengths.

O_{3, i} = \bar{w} = \frac{w_{i}}{w_{1} + w_{2}}

(8)

The fourth layer, called consequent node, is a standard perceptron. Every node i in this layer is an adaptive node with a node function:

O_{4, i} = w_{i}^{n} f_{i} = w_{i}^{n} \cdot (p_{i} x_{2} + q_{i} x_{2} + r_{i}), i = 1, 2

(9)

where (p_i, q_i, r_i) is the parameter set of this node; These are referred to as the consequent parameters. Finally, the fifth layer is called the output layer, which computes the overall output as the summation of all incoming signals.

O_{5, i} = overall output = \sum \bar{w_{i}} f_{i} = \frac{\sum w_{i} f_{i}}{\sum w_{i}}

(10)

2.6. Fuzzy C-Means (FCM) Clustering Method

ANFIS uses neural network training methods and employs fuzzy logic to fit a relationship between some inputs space to output space [13]. The problem with the former is that as the number of inputs increases, the number of rules rises rapidly. In other words, the fuzzy rules increase exponentially [22]. In order to avoid the dimensionality problem in generating too large rule-bases in ANFIS, clustering algorithms are used in FIS models to generate fewer fuzzy rules. It should be noted that clustering methods provide information to obtain initial rules and fuzzy structure for generating the fuzzy inference system [45]. Then, the fuzzy inference system based on these clustering methods can provide a powerful model for establishing relationships between inputs and the output space. The fuzzy C-means (FCM) clustering method was initially introduced by Bezdek in 1973 [46,47] and this approach is known as an improvement and modification of the K-means clustering. In this approach, each data point belongs partly to all clusters, however, with different membership degrees that can vary between 0 and 1. The objective function in the FCM is minimizing the total intra-cluster variance, i.e., the summed square error function within clusters (j_m) [48]:

j_{m} (u, v) = \sum_{i = 1}^{c} \sum_{j = 1}^{n} u_{i j}^{m} d^{2} (x_{j}, v_{i})

(11)

where U = [u_ij]_c_×n is the fuzzy partition matrix of c clusters and n data, V = (v₁, v₂, …, v_i) are the centroids of the clusters, u_ij is the partial membership degree of data x_j in cluster i (

0 \leq u_{i j} \leq 1 & \sum_{i = 1}^{c} u_{i j} = 1, 2, \dots, n

), m is the weighting exponent on each fuzzy membership (which is equal to 2 in this study), x_j = (x₁, x₂, …, x_n) is the dataset, v_i is the initial value for the cluster center, and d(x_j, v_i) is the Euclidean distance between the x_j data and the cluster center of the i^th cluster v_i, i.e., x_j−v_i. Equation (11) describes a non-linear optimization problem which can be solved by iterative minimization. The centroid of each cluster is calculated by the partial derivative of Equation (11) with respect to V, then the partial membership degree of data is updated in each iteration by differentiation of the above equation with respect to U:

u_{i j} = {(\sum_{k = 1}^{c} {(\frac{d (x_{j}, v_{i})}{d (x_{j}, v_{k})})}^{\frac{2}{m - 1}})}^{- 1}, i = 1, 2, \dots, c; j = 1, 2, \dots, c;

(12)

The iteration process is stopped when the variance of intra-clusters in five iterations does not change more than the determined minimum improvement (in this study 10⁻⁸), i.e., a minimum improvement in the FCM objective function (convergence) does not occur in five iterations and/or a maximum number of iterations (1000 iterations in this study) is exceeded. In the present study, the FCM algorithm was incorporated into (i.e., coded in) the ANFIS model. The procedure is sketched in Figure 6. The optimal value of the number of clusters was obtained based on trial and error. The RMSE values were calculated for different numbers in the training and test phases (Figure 7). As shown in Figure 7, using two clusters resulted in the least error.

Figure 6. Flow diagram of the combination of fuzzy c-means clustering (FCM) method and ANFIS [11].

Figure 7. The values of RMSE for different number of clusters in the FCM method.

2.7. Wavelet Transformation

The Wavelet Transformation (WT) method is a mathematical tool in Fourier Transform, but seems a more effective tool than the Fourier transform (FT) for studying nonstationary signals [49]. WT is popular because of its multi resolution in time and frequency domain related to temporal data and has been developed to determine information that cannot be easily achieved from raw signals themselves. WT has the ability to consider different aspects of a time series such as trends, discontinuities, and breakpoints [50,51]. In the past, WT was widely used for many hydrological processes including GWL fluctuations, river flow forecast, fault classification or suspended sediments. The continuous wavelet transform (CWT) W(τ,s) of a signal x(t) with respect to a mother wavelet ψ(t) is defined as follows [52]:

W (τ, s) = {| s |}^{- \frac{1}{2}} \int_{- \infty}^{+ \infty} x (t) ψ^{*} (\frac{t - τ}{s}) d t τ \in R, s \in R, s \neq 0

(13)

where τ is the translation parameter, s is the wavelet scale parameter, t is time, and * refers to the complex conjugate. However, the CWT is not often used for forecasting because it requires calculating wavelet coefficients at each scale which means more calculations and, in turn, more compute time. Instead, the discrete wavelet transform (DWT) requires less computation time and is simpler to develop [53]. Discrete wavelets have the following general form [49]:

φ_{m, n} (t) = s_{0}^{\frac{- m}{2}} ψ (\frac{t - n τ_{0} s_{0}^{m}}{s_{0}^{m}})

(14)

where m and n are integers that control the scale and time respectively, τ₀ is the location parameter that must be greater than 0, and S₀ is a specified fixed dilation step greater than 1. The DWT performs two functions viewed as high-pass and low-pass filters, through which the original time series are passed and then the original time series data are decomposed and divided into two parts, namely “approximation” and “details” [54]. These components explain behavior better and reveal more information about the process than the original time series. Therefore, they can help forecasting models predict with greater accuracy [55,56]. There are many wavelets that can be used as mother wavelets, For the choice of the discrete father-scaling/mother-wavelet filter functions in the DWT/MRA, popular ones are the Haar, the Daubechies wavelet db4 and the irregular wavelet Symlet (sym4), and these are those have been used here [38,57].

2.8. Hybrid Wavelet-ANFIS Model

As shown by [58], a combination of DWT and ANFIS can lead to greater accuracy of GWL prediction. In this regard, after the determination of important parameters, the ANFIS model with the FCM rule generator and the DWT- MRA- decomposed time series of the average GWL and P are integrated to form the Wavelet- ANFIS hybrid model. More specifically, several combinations of different time-lagged GWL- and P- data have been decomposed by haar, Db4, and sym4 data to 1, 2, 3, and 4 levels.

3. Results and Discussion

The performance of the ANFIS and Wavelet-ANFIS models were evaluated by calculating R² and RMSE statistical parameters. Shown in Table 3, as expected, the performance of the ANFIS model is lower than that of the Wavelet-ANFIS hybrid model, because the Wavelet-ANFIS hybrid model takes advantage of both ANFIS and WT, simultaneously. In particular, WT, or specifically, the DWTMRA, has the ability to consider various aspects of a time series such as trends, discontinuities and breakpoints, and this time-scale localization feature was used to provide decomposition of the input time series up to the 4 levels in the present application, allowing for a separation of the approximation and for providing details of the non-stationary noisy time series, thus enhancing the ANFIS, which uses a combination of fuzzification of the input through membership functions with the network-based algorithm and the use of the hybrid (back propagation tries and gradient descent) optimization method. Table 3 shows the minimum RMSE and the maximum R² in the training phase with 0.05 and 0.997 and in the test phase as 0.08 and 0.974, respectively, which are obtained with the hybrid wavelet ANFIS model and using the Sym4-mother wavelet (second input combination, Level Decomposition = 3). The best results of the Haar mother wavelets in the test phase has RMSE and R² values of 0.13 and 0.929, respectively. The db4 mother wavelets also exhibits the best results in the test phase, with corresponding values of 0.09 and 0.968. Comparison of the results of the db4 and sym4 mother wavelets revealed that, although there was no significant difference between the results of these two mother wavelets, the sym4 mother wavelets outperforms db4 mother wavelets slightly. The less accurate results are related to Haar mother wavelets since these are only orthogonal wavelet with a linear phase that cannot provide nonlinear shifts between the original and decomposed signals. By adding the precipitation parameter to the input data, the output accuracy increased in most simulations (on average, 10.2%), although it was more significant in the Wavelet-ANFIS models. In this research, the effect of the decomposition level on the accuracy of the output results of the Wavelet-ANFIS models was investigated. Among the different levels, the decomposition at level 3 had the highest accuracy. However, the accuracy of decomposition level 2 was also acceptable (with only a small difference compared to level 3).

Table 3. Results of the various model combinations of input data with the optimum number of clusters (c = 2) for the training and testing phases of ANFIS.

Figure 8 shows the regression plots of the Wavelet-ANFIS-simulated over the observed GWL-data for the training and testing phases and in agreement with the high R², the points are nearly perfectly lying on the “slope = 1” regression line. The GWL-time series observed and simulated with the best combination of the Wavelet-ANFIS model are presented in Figure 9. There is an acceptable correlation between the observed and simulated GWL data, and the effect of the applied delays on the input data, especially the piezometric head data, is clearly visible. All data-driven prediction methods are based on the idea that the random errors are drawn from a normal distribution and the hybrid Wavelet-ANFIS model is not an exception. Figure 10 illustrates that a-posteriori computed errors of the GWLs predicted with this model, follow indeed a normal distribution.

Figure 8. Regression plots of Wavelet-ANFIS (Sym4 mother wavelet)-simulated over observed GWL-data for training (left panel) and testing (right) phases.

Figure 9. Wavelet-ANFIS/Sym4-simulated and observed GWL-time series for the training (upper panel) and testing (lower panel) phases.

Figure 10. Distribution of the GWL-random errors of the wavelet-ANFIS model.

Exceptionally good and reliable predictions of the average GWL of the Maragheh plain using the proposed hybrid wavelet-ANFIS model are evident in all the above presented results. Although the average of GWL is the first and appropriate criterion for the evaluation of the GWL changes all over the plain, in order to understand the spatial variations across the piezometer network in the study plain, the GWL fluctuations in the 25 observation wells were simulated by the hybrid model. The calculated values of R² and RMSE for both the training and test phase are given in Figure 11. According to most R² values above 0.95 and most RMSE values less than 0.20 m, especially in the test phase, it can be argued that the model has an acceptable performance in almost all wells.

Figure 11. Results of GWL-simulation in all 25 wells (see Figure 2).

The results clearly indicate that simulating and predicting a vital parameter of the groundwater budget, namely GWL at time t, can be done accurately using the proposed hybrid model with values of GWL from the previous month and precipitation data as inputs. Although, this is a data driven model, choosing relevant inputs is important because, for example as illustrated in this study, the model considers the groundwater level changes that have a direct relationship with changes in groundwater storage (DS). Furthermore, the difference between volume of precipitation and DS is equal to the sum of other components of the groundwater budget equation, including irrigation water, domestic and industrial demands, as well as natural hydrological processes.

4. Conclusions

This study developed and evaluated hybrid Wavelet-ANFIS models based on the FCM method for groundwater level-(GWL) simulation and prediction, using the Maragheh plain, in northwestern Iran as a case study. In the developed models, GWL were predicted using antecedent GWL-data and precipitation data. A monthly GWL-data series recorded at 25 observation wells during a 22-year period was taken into account. However, given that the GWL data series were not complete, statistical data deficiencies were interpolated using the IDW method. After completing data pre-processing, various statistical analyses were carried out to investigate the time lag correlation of the GWL-time series it was established that that for yielding acceptable simulations of GWL values for a particular month, GWL- and P- data values from 3–4 months prior would be necessary. The 7-time lagged data were used directly as inputs to the ANFIS model to predict GWL, while in the hybrid Wavelet-ANFIS model, the GWL- and P-time series were decomposed to level 4 by means of DWT-MRA.

The performance of the developed hybrid model was examined at four decomposition levels. Input data were divided into two parts: one for simulation (training) and the other for prediction (testing) phases. Thus, in this study, the length of the training data set is 70% and the test (prediction) data makes up the remaining 30%. The best values of RMSE and R² were obtained 0.05 m and 0.997, respectively, in the training phase and of 0.08 m and 0.974 the testing (prediction) phase, which indicates that the hybrid Wavelet-ANFIS using the Symlet mother wavelet and decomposition level 3 performs best. According to the results of the hybrid Wavelet-ANFIS model, the model performance was acceptable in estimating GWL fluctuations all over the plain, such that the difference between the observed and simulated values was negligible, meaning the incorporation of the wavelet transforms in ANFIS increased the performance of ANFIS model, especially in the prediction phase. Another noteworthy characteristic of this coupled model in the simulated and predicted values of GWL fluctuations is the use of the FCM clustering method which generates fewer fuzzy rules, thereby overcoming the well-known data dimensionality problem.

Finally, it can be argued that the Wavelet-ANFIS-FCM-clustering model in terms of performance, especially, when, due to a lack of data, all physical processes such as surface-groundwater interactions, are not completely understood, presents an attractive method and reliable tool for groundwater level prediction under different water resources management scenarios. Nonetheless, there is a need to further evaluate the developed model in other regions or in the same plain but with different GWL- and P-time series.

Author Contributions

Conceptualization, H.O. and M.Z.; methodology, M.Z.; software, M.M.J.; validation, H.O., M.M.J. and M.Z.; writing—original draft preparation, M.M.J. and H.O.; writing—review and editing, G.J.-P.S.; supervision, H.O., G.J.-P.S. and M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The time and effort of M.Z. was supported by the Luxembourg National Research Fund (FNR) in the framework of an industrial fellowship with Ref. No. 14111453.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request. The data presented in this study are available on request from the first author (M.M.J.). The ground water data in Iran are not publicly available but can be requested from respective water authorities.

Conflicts of Interest

The authors declare no conflict of interest.

References

Koch, M. Challenges for future sustainable water resources management in the face of climate change. In Proceedings of the 1st NPRU Academic Conference, Nakhon Pathom University, Nakhon Pathom, Thailand, 23–24 October 2008. [Google Scholar]
Nayak, P.; SatyajiRao, Y.; Sudheer, K. Groundwater level forcasting in a shallow aquifer using artificial neural network. J. Water Resour. Manag. 2006, 20, 77–90. [Google Scholar] [CrossRef]
Jolly, I.D.; McEwan, K.L.; Holland, K.L. A review of groundwater–surface water interactions in arid/semiarid wetlands and the consequences of salinity for wetland ecology. Ecohydrology 2008, 1, 43–58. [Google Scholar] [CrossRef]
Prinos, S.T.; Lietz, A.C.; Irvin, R.B. Design of a real-time groundwater level monitoring network and portrayal of hydrologic data in Southern Florida. In USGC Water Resources Investigations Report; Geological Survey (US): Reston, WV, USA, 2002. [Google Scholar]
Lijun, F.; Shuquan, L. Forecasting the runoff using least square support vector machine. Tianjin Teach. Comm. 2007, TJGL06-099, 884–889. [Google Scholar]
Nourani, V.; Alami, M.T.; Aminfar, M.H. Combined neural—Wavelet model for prediction of Ligvanchayi watershed precipitation. Eng. Appl. Artif. Intell. 2009, 22, 466–472. [Google Scholar] [CrossRef]
Behzad, M.; Asghari, K.; Coppola, E.J.R. Comparative study of SVMs and ANNs in aquifer water level prediction. J. Comput. Civ. Eng. 2010, 24, 408–413. [Google Scholar] [CrossRef]
Yoon, H.; Jun, S.C.; Hyun, Y.; Bae, G.O.; Lee, K.K. A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer. J. Hydrol. 2011, 396, 128–138. [Google Scholar] [CrossRef]
Shiri, J.; Kisi, O.; Yoon, H.; Lee, K.-K.; Hossein Nazemi, A. Predicting groundwater level fluctuations with meteorological effect implications—A comparative study among soft computing techniques. Comput. Geosci. 2013, 56, 32–44. [Google Scholar] [CrossRef]
Mirzavand, M.; Khoshnevisan, B.; Shamshirband, S.; Kisi, O.; Ahmad, R.; Akib, S. Evaluating groundwater level fluctuation by support vector regression and neuro fuzzy methods: A comparative study. Nat. Hazards 2015, 102, 1611–1612. [Google Scholar] [CrossRef]
Zare, M.; Koch, M. Groundwater level fluctuations simulation and prediction by ANFIS- and hybrid Wavelet-ANFIS/Fuzzy C-Means (FCM) clustering models: Application to the Miandarband plain. J. Hydro-Environ. Res. 2018, 18, 63–76. [Google Scholar] [CrossRef]
Srivastav, R.K.; Sudheer, K.P.; Chaubey, I. A simplified approach to quantifying predictive and parametric uncertainty in artificial neural network hydrologic models. Water Resour. Res. 2007, 43. [Google Scholar] [CrossRef]
Jang, J.S.R. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
Shiri, J.; Kisi, O. Comparison of genetic programming with neuro-fuzzy systems for predicting short-term water table depth fluctuations. Comput. Geosci. 2011, 37, 1692–1701. [Google Scholar] [CrossRef]
Amutha, R.; Porchelvan, P. Seasonal prediction of groundwater levels using ANFIS and radial basis neural network. Int. J. Geol. Earth Environ. Sci. 2011, 1, 98–108. [Google Scholar]
Moosavi, V.; Vafakhah, M.; Shirmohammadi, B.; Ranjbar, M. Optimization of Wavelet-ANFIS and Wavelet-ANN Hybrid Models by Taguchi Method for Groundwater Level Forecasting. Arab. J. Sci. Eng. 2014, 39, 1785–1796. [Google Scholar] [CrossRef]
Wen, X.; Feng, Q.; Yu, H.; Wu, J.; Si, J.; Chang, Z.; Xi, H. Wavelet and adaptive neuro-fuzzy inference system conjunction model for groundwater level predicting in a coastal aquifer. Neural. Comput. Appl. 2015, 26, 1203–1215. [Google Scholar] [CrossRef]
Emamgholizadeh, S.; Moslemi, K.; Karami, G.H. Prediction the groundwater level of Bastam Plain (Iran) by Artificial Neural Network (ANN) and Adaptive Neuro-Fuzzy Inference System (ANFIS). Water Resour. Manag. 2014, 28, 5433–5446. [Google Scholar] [CrossRef]
Khaki, M.; Yusoff, I.; Islami, N. Application of the Artificial Neural Network and Neuro-fuzzy System for Assessment of Groundwater Quality. Clean Soil Air Water. 2014, 42, 1–10. [Google Scholar] [CrossRef]
Rajaee, T.; Ebrahimi, H.; Nourani, V. A review of the artificial intelligence methods in groundwater level modeling. J. Hydrol. 2019, 572, 336–351. [Google Scholar] [CrossRef]
Zare, M.; Koch, M. Using ANN and ANFIS Models for simulating and predicting groundwater level fluctuations in the Miandarband Plain, Iran. In Proceedings of the 4th IAHR Europe Congress, Liege, Belgium, 27–29 July 2016; pp. 416–423. [Google Scholar]
Vaidhehi, V. The role of Dataset in training ANFIS system for course Advisor. Int. J. Innov. Res. Adv. Eng. 2014, 1, 249–253. [Google Scholar]
Mirrashid, M. Earthquake magnitude prediction by adaptive neuro-fuzzy inference system (ANFIS) based on fuzzy C-means algorithm. Nat. Hazards 2014, 74, 1577–1593. [Google Scholar] [CrossRef]
Kuczera, G. Uncorrelated measurement error in flood frequency inference. Water Resour. Res. 1992, 28, 183–188. [Google Scholar] [CrossRef]
Sang, Y.F.; Wang, D.; Wu, J.C.; Zhu, Q.P.; Wang, L. The relation between periods’ identification and noises in hydrologic series data. J. Hydrol. 2009, 368, 165–177. [Google Scholar] [CrossRef]
Donoho, D.H. De-noising by soft-thresholding. IEEE Trans. Inform. Theory 1995, 41, 613–617. [Google Scholar] [CrossRef]
Elshorbagy, A.; Simonovic, S.P.; Panu, U.S. Noise reduction in chaotic hydrologic time series: Facts and doubts. J. Hydrol. 2002, 256, 147–165. [Google Scholar] [CrossRef]
Torrence, C.; Compo, G.P. A practical guide to wavelet analysis. Bull. Am. Meteorol. 1998, 79, 61–78. [Google Scholar] [CrossRef]
Jansen, M. Minimum risk thresholds for data with heavy noise. IEEE Signal Process. Lett. 2006, 13, 296–299. [Google Scholar] [CrossRef]
Nourani, V.; Hosseini Baghanam, A.; Adamowski, J.; Kisi, O. Applications of hybrid wavelet–Artificial intelligence models in hydrology: A review. J. Hydrol. 2014, 514, 358–377. [Google Scholar] [CrossRef]
Nourani, V.; Alami, M.T.; Daneshvar Vousoughi, F. Wavelet-entropy data pre-processing approach for ANN-based groundwater level modeling. J. Hydrol. 2015, 524, 255–269. [Google Scholar] [CrossRef]
Ebrahimi, H.; Rajaee, T. Simulation of groundwater level variations using wavelet combined with neural network, linear regression and support vector machine. Glob. Planet. Chang. 2016, 148, 181–191. [Google Scholar] [CrossRef]
Barzegar, R.; Fijani, E.; Asghari Moghaddama, A.; Tziritis, E. Forecasting of groundwater level fluctuations using ensemble hybrid multi-wavelet neural network-based models. Sci. Total Environ. 2017, 599–600, 20–31. [Google Scholar] [CrossRef]
Malekzadeh, M.; Kardar, S.; Saeb, K.; Shabanlou, S.; Taghavi, L. A novel approach for prediction of monthly ground water level using a hybrid wavelet and non-tuned self-adaptive machine learning model. Water Resour. Manag. 2019, 33, 1609–1628. [Google Scholar] [CrossRef]
Shirmohammadi, B.; Moradi, H.; Moosavi, V.; Semiromi, M.T.; Zeinali, A. Forecasting of meteorological drought using Wavelet-ANFIS hybrid model for different time steps (case study: Southeastern part of east Azerbaijan province, Iran). Nat. Hazards 2013, 69, 389–402. [Google Scholar] [CrossRef]
Shafaei, M.; Kisi, O. Lake Level Forecasting Using Wavelet-SVR, Wavelet-ANFIS and Wavelet-ARMA Conjunction Models. Water Resour. Manage. 2015, 30, 79–97. [Google Scholar] [CrossRef]
Karthika, B.S.; Deka, P.C. Prediction of air temperature by hybridized model (Wavelet-ANFIS) using wavelet decomposed data. Aquat. Procedia 2015, 4, 1155–1161. [Google Scholar] [CrossRef]
Moosavi, V.; Vafakhah, M.; Shirmohammadi, B.; Behnia, N. A Wavelet-ANFIS hybrid model for groundwater level forecasting for different prediction periods. Water Resour. Manag. 2013, 27, 1301–1321. [Google Scholar] [CrossRef]
Sang, Y.F.; Wang, D.; Wu, J.C.; Zhu, Q.P.; Wang, L. Entropy-based wavelet de-noising method for time series analysis. Entropy 2009, 11, 1123–1147. [Google Scholar] [CrossRef]
Markovic, D.; Koch, M. Stream response to precipitation variability: A spectral view based on analysis and modelling of hydrological cycle components. Hydrol. Process. 2015, 29, 1806–1816. [Google Scholar] [CrossRef]
Zhang, Y.-K.; Li, Z. Temporal scaling of hydraulic head fluctuations: Nonstationary spectral analyses and numerical simulations. Water Resour. Res. 2005, 41, 7–10. [Google Scholar] [CrossRef]
Jang, J.S.R.; Sun, C.T.; Mizutani, E. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence; Prentice Hall: Upper Saddle River, NJ, USA, 1997. [Google Scholar]
Tahmasebi, P.; Hezarkhani, A. A hybrid neural networks-fuzzy logic-genetic algorithm for grade estimation. Comput. Geosci. 2012, 42, 18–27. [Google Scholar] [CrossRef]
Takagi, T.; Sugeno, M. Fuzzy identification of systems and its applications to modeling and control. In IEEE Transactions on Systems Man and Cybernetics; IEEE: Piscataway, NJ, USA, 1985; pp. 116–132. [Google Scholar]
Cobaner, M. Evapotranspiration estimation by two different neuro-fuzzy inference systems. J. Hydrol. 2011, 398, 292–302. [Google Scholar] [CrossRef]
Bezdek, J. Cluster validity with fuzzy sets. J. Cybern. 1973, 3, 58–73. [Google Scholar] [CrossRef]
Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Plenum Press: New York, NY, USA, 1981; pp. 1–256. [Google Scholar]
Ayvaz, M.T.; Karahan, H.; Aral, M.M. Aquifer parameter and zone structure estimation using kernel-based fuzzy c-means clustering and genetic algorithm. J. Hydrol. 2007, 343, 240–253. [Google Scholar] [CrossRef]
Grossman, A.; Morlet, J. Decompositions of hardy functions into square integrable wavelets of constant shape. Siam J. Math. Anal. 1984, 15, 723–736. [Google Scholar] [CrossRef]
Adamowski, J.; Sun, K. Development of a coupled wavelet transform and neural network method for flow forecasting of non-perennial rivers in semi-arid watersheds. J. Hydrol. 2010, 390, 85–91. [Google Scholar] [CrossRef]
Singh, R.M. Wavelet-ANN model for flood events. In Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011), Springer India, Roorkee, India, 20–22 December 2011; pp. 165–175. [Google Scholar]
Partal, T. River flow forecasting using different artificial neural network algorithms and wavelet transform. Can. J. Civ. Eng. 2009, 36, 26–38. [Google Scholar] [CrossRef]
Adamowski, J.; Fung Chan, H. A wavelet neural network conjunction model for groundwater level forecasting. J. Hydrol. 2011, 407, 28–40. [Google Scholar] [CrossRef]
Kisi, O.; Cimen, M. A wavelet-support vector machine conjunction model for monthly streamflow forecasting. J. Hydrol. 2011, 399, 132–140. [Google Scholar] [CrossRef]
Catalão, J.P.S.; Pousinho, H.M.I.; Mendes, V.M.F. Hybrid wavelet-PSO-ANFIS approach for short-term electricity prices forecasting. IEEE Trans. Power Syst. 2011, 26, 137–144. [Google Scholar] [CrossRef]
Remesan, R.; Bray, M.; Shamim, M.A.; Han, D. Rainfall-runoff modelling using a wavelet-based hybrid SVM scheme. Hydroinformatics in hydrology, hydrogeology and water resources. In Proceedings of the Symposium JS.4 at the Joint Convention of the International Association of Hydrological Sciences (IAHS) and the International Association of Hydrogeologists (IAH), Hyderabad, India, 6–12 September 2009; pp. 41–50. [Google Scholar]
Mallat, S.G. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674–693. [Google Scholar] [CrossRef]
Nourani, V.; Alami, M.T.; Daneshvar Vosoughi, F. Hybrid of SOM-clustering method and wavelet-anfis approach to model and infill missing groundwater level data. J. Hydrol. Eng. 2016, 21, 05016018. [Google Scholar] [CrossRef]

Figure 1. The Maragheh plain in northwestern Iran.

Figure 2. Borehole locations with Thiessen polygons.

Figure 3. Average groundwater level (GWL) fluctuations in the Maragheh plain during 1996–2018.

Figure 4. Autocorrelation function for GWL (a) and cross correlation function between GL and Precipitation (b).

Figure 5. Architecture of ANFIS [11].

Figure 6. Flow diagram of the combination of fuzzy c-means clustering (FCM) method and ANFIS [11].

Figure 7. The values of RMSE for different number of clusters in the FCM method.

Figure 8. Regression plots of Wavelet-ANFIS (Sym4 mother wavelet)-simulated over observed GWL-data for training (left panel) and testing (right) phases.

Figure 9. Wavelet-ANFIS/Sym4-simulated and observed GWL-time series for the training (upper panel) and testing (lower panel) phases.

Figure 10. Distribution of the GWL-random errors of the wavelet-ANFIS model.

Figure 11. Results of GWL-simulation in all 25 wells (see Figure 2).

Table 1. List of the 25 wells and the areas of their Theissen polygons.

ID	Location	A(ha)	Elevation(m)	ID	Location	A(ha)	Elevation(m)
1	Seraju (khoshkbarchi)	1194	1397.2	14	Akhond gheshlagh	650	1280.58
2	Sezavar mousavi	1659	1331	15	Alghou	2099	1336.31
3	Varjouy	1801	1404.8	16	Pahr abad	2243	1413.41
4	Yengi kand khoushe mehr	3694	1327.92	17	Khoushe mehr	1007	1314.25
5	Rousht bozorg istgahe rah ahan	2096	1337.89	18	Rousht kouchak	921	1296.26
6	Khaneh bargh ghadim	931	1287.01	19	Bonab energy atomi	1527	1280.25
7	Gharah chopogh behdasht	1553	1285.09	20	Bonab mantagheh abiyari	1089	1282.14
8	Khalilvand rouberouye ajorpazi	998	1286.82	21	Jadeh sarj baghal kanal	1454	1345.71
9	Maragheh foroudgah	1094	1325.23	22	Khanghah ghabrestan	2126	1359.17
10	Zavasht ghabrestan	1416	1297.9	23	Maragheh khajeh nasir	3251	1474.24
11	Varjouy maleki	1465	1371.99	24	Bonab aval rah darya	1529	1284.25
12	Khezerlou shourgol	1645	1284.4	25	Chelghay masjed	2285	1305.22
13	Zavaregh bimarestan	1706	1292.36	-	-	-	-

Table 2. Number of missing values for wells with missing data.

Well ID	3	8	10	12	18	20	24
Number of Missing Values	17	16	27	17	30	7	5

Table 3. Results of the various model combinations of input data with the optimum number of clusters (c = 2) for the training and testing phases of ANFIS.

Method	Decomposition Level	ANFIS Inputs	Training		Testing
Method	Decomposition Level	ANFIS Inputs	RMSE(m)	R²	RMSE(m)	R²
ANFIS	-	GLt−1,GLt−2,GLt−3	0.16	0.962	0.19	0.870
Wavelet-ANFIS/haar	1	-	0.11	0.983	0.14	0.923
	2	-	0.11	0.983	0.14	0.921
	3	-	0.10	0.984	0.19	0.878
	4	-	0.11	0.983	0.16	0.891
Wavelet-ANFIS/db4	1	-	0.09	0.987	0.13	0.949
	2	-	0.07	0.992	0.11	0.964
	3	-	0.07	0.995	0.08	0.972
	4	-	0.06	0.996	0.14	0.921
Wavelet-ANFIS/sym4	1	-	0.10	0.951	0.14	0.933
	2	-	0.08	0.99	0.12	0.955
	3	-	0.06	0.994	0.09	0.969
	4	-	0.06	0.994	0.09	0.966
ANFIS	-	GLt−1,GLt−2,GLt−3,Pt−1,Pt−2,Pt−3,Pt−4	0.13	0.975	0.18	0.878
ANFIS	-	GLt−1,GLt−2,GLt−3,Pt−1,Pt−2,Pt−3,Pt−4	-	-	-	-
Wavelet-ANFIS/haar	1	-	0.09	0.988	0.16	0.891
	2	-	0.09	0.988	0.13	0.927
	3	-	0.09	0.988	0.13	0.929
	4	-	0.09	0.988	0.13	0.925
Wavelet-ANFIS/db4	1	-	0.08	0.991	0.12	0.949
	2	-	0.07	0.994	0.09	0.967
	3	-	0.06	0.996	0.09	0.968
	4	-	0.06	0.997	0.1	0.959
Wavelet-ANFIS/sym4	1	-	0.08	0.99	0.13	0.931
	2	-	0.07	0.993	0.11	0.954
	3	-	0.06	0.995	0.08	0.974
	4	-	0.05	0.996	0.08	0.974

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Application of a Novel Hybrid Wavelet-ANFIS/Fuzzy C-Means Clustering Model to Predict Groundwater Fluctuations

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Description

2.3. Completion of Groundwater Level Missing Values

2.4. Training and Testing of GWL Prediction Models

2.5. Adaptive Neuro Fuzzy Inference System (ANFIS)

2.6. Fuzzy C-Means (FCM) Clustering Method

2.7. Wavelet Transformation

2.8. Hybrid Wavelet-ANFIS Model

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics