Artificial Neural Network Optimized with a Genetic Algorithm for Seasonal Groundwater Table Depth Prediction in Uttar Pradesh, India

Accurate information about groundwater level prediction is crucial for effective planning and management of groundwater resources. In the present study, the Artificial Neural Network (ANN), optimized with a Genetic Algorithm (GA-ANN), was employed for seasonal groundwater table depth (GWTD) prediction in the area between the Ganga and Hindon rivers located in Uttar Pradesh State, India. A total of 18 models for both seasons (nine for the pre-monsoon and nine for the post-monsoon) have been formulated by using groundwater recharge (GWR), groundwater discharge (GWD), and previous groundwater level data from a 21-year period (1994–2014). The hybrid GA-ANN models’ predictive ability was evaluated against the traditional GA models based on statistical indicators and visual inspection. The results appraisal indicates that the hybrid GA-ANN models outperformed the GA models for predicting the seasonal GWTD in the study region. Overall, the hybrid GA-ANN-8 model with an 8-9-1 structure (i.e., 8: inputs, 9: neurons in the hidden layer, and 1: output) was nominated optimal for predicting the GWTD during preand post-monsoon seasons. Additionally, it was noted that the maximum number of input variables in the hybrid GA-ANN approach improved the prediction accuracy. In conclusion, the proposed hybrid GA-ANN model’s findings could be readily transferable or implemented in other parts of the world, specifically those with similar geology and hydrogeology conditions for sustainable planning and groundwater resources management.


Introduction
Groundwater is one of the most vital natural resources. It promotes healthy human life, economic growth, and environmental sustainability. It becomes a reliable source of water in all climatic regions of the world [1]. Due to rapid population growth, industrial development, agricultural activities, and increased domestic use, most of the world's countries will face a freshwater shortage problem [2]. The spatial-temporal variation, discrepancies of groundwater resources, and increased groundwater dependence have also impacted groundwater levels [3,4]. The physical-based model requires explicit knowledge about the study region's physical properties (characterization and quantification), boundary conditions, and big dataset; usually, these aspects are very laborious, costly, and time-consuming [1,5]. To overcome these difficulties, the machine learning-based model has proved the ability to solve large complex problems, including rainfall-runoff modeling [6][7][8], hydrometeorological drought square error (RMSE), coefficient of variation of error residuals (CVRE), absolute prediction error (APE), and performance index (PI)) and through visual interpretation for optimal utilization of groundwater resources in the study area.

Description of the Study Area
The study region is situated between the Hindon River and the Ganga River, consisting of an alluvial cover of the Gangetic plain covering approximately 563,647 hectares with varying elevations from 215 m to 273 m above MSL (mean sea level). The study region lies between the latitude of 28°66′ N and 29°92′ N and longitude of 77°46′ E and 78°02′ E, consisting of 24 blocks: 3 blocks of Saharanpur district (Baliakheri, Nagal, Deoband), 6 blocks of Muzaffarnagar district (Charthawal, Purkaji, Muzaffarnagar, Shahpur, Janpath, Khatauli), 12 blocks of Meerut district (Sardhana, Rohta, Daurala, Mawana, Meerut, Janikurd, Saroorpur, Paricchitgarh, Hastinapur, Rajpura, Kharkhoda, Macchra) and 3 blocks of Ghaziabad district (Muradnagar, Razapur, Bhojpur) of Uttar Pradesh. Figure 1 illustrates the location map study area with all the blocks. The entire study region's climate is subtropical monsoon with annual rainfall ranging from 933 mm to 1204 mm.

Hydrogeology of Study Area and Data Acquisition
The study region covers alluvium plain, consisting of sand, silt, clay, and minerals like sodium carbonate, sodium chloride, and sodium sulfate with calcium and magnesium, and detrital traces in varying proportions. Typically, the deposit of the sand bed contains the groundwater in the area. The abstractions of groundwater are utilized for irrigation, drinking, industrial, and domestic purposes in the study area. The groundwater abstraction rate is 942 m 3 /h, while the demand is estimated to be between 5069 and 12,672 m 3 /h [70]. This study location has heterogeneous types of aquifers, which are divided into three categories: • Aquifer type group I is composed of different types of basalt rocks, like weathered, dense, and vesicular. Groundwater occurs under water table conditions 30 m or lower than ground level. The tube well discharge varies from 97 to 227 m 3 /h for drawdown between 2.68 m and 0.68 m.

Hydrogeology of Study Area and Data Acquisition
The study region covers alluvium plain, consisting of sand, silt, clay, and minerals like sodium carbonate, sodium chloride, and sodium sulfate with calcium and magnesium, and detrital traces in varying proportions. Typically, the deposit of the sand bed contains the groundwater in the area. The abstractions of groundwater are utilized for irrigation, drinking, industrial, and domestic purposes in the study area. The groundwater abstraction rate is 942 m 3 /h, while the demand is estimated to be between 5069 and 12,672 m 3 /h [70]. This study location has heterogeneous types of aquifers, which are divided into three categories: for the pre-and post-monsoon seasons were collected from the Groundwater Department of Uttar Pradesh (India) for the same period. Figure 2 shows the location of observation wells in the study area. The pre-monsoon, post-monsoon, monsoon, and non-monsoon seasons are March-May, October-December, June-September, and January-February, respectively. Figure 3a,b demonstrates the variation of groundwater table depth measured at 38 observation wells during the pre-and post-monsoon seasons. The study region's statistical data were taken from the corresponding district statistical departments regarding the number of minor irrigation structures, the area taken by minor and major crops, area irrigated by minor irrigation structures, and the human population and livestock.  Figure 2 shows the location of observation wells in the study area. The pre-monsoon, post-monsoon, monsoon, and non-monsoon seasons are March-May, October-December, June-September, and January-February, respectively. Figure 3a,b demonstrates the variation of groundwater table depth measured at 38 observation wells during the pre-and post-monsoon seasons. The study region's statistical data were taken from the corresponding district statistical departments regarding the number of minor irrigation structures, the area taken by minor and major crops, area irrigated by minor irrigation structures, and the human population and livestock.

Genetic Algorithm (GA)
The GA is a robust optimization metaheuristic algorithm, driven by natural and biological selection based on Darwin's survival theory [71,72]. The GA has no hypotheses like linearity, stationary, or uniformity, and does not depend on any specific conceptual phenomenon. It involves chromosomes, population set, fitness function, mutation, and selection steps. This provides a set of solutions, named populations, that are governed by chromosomes. The solutions are taken from one population and used to originate a new population, based on the idea that the newly developed population will be better than the older population.
Furthermore, solutions are chosen to develop new solutions (offspring) as per the fitness function. The above procedure will be repeated until the number of offspring in the final population is the same as that equal to the number of parents in the initial population. Two genetic operators are used in these processes: crossover and mutation. In this study, double point crossover and Gaussian mutation operators were used with a crossover and mutation probability of 0.01. Figure 4 shows the general process chart of the genetic algorithm. The necessary steps of the GA are outlined below:

Genetic Algorithm (GA)
The GA is a robust optimization metaheuristic algorithm, driven by natural and biological selection based on Darwin's survival theory [71,72]. The GA has no hypotheses like linearity, stationary, or uniformity, and does not depend on any specific conceptual phenomenon. It involves chromosomes, population set, fitness function, mutation, and selection steps. This provides a set of solutions, named populations, that are governed by chromosomes. The solutions are taken from one population and used to originate a new population, based on the idea that the newly developed population will be better than the older population.
Furthermore, solutions are chosen to develop new solutions (offspring) as per the fitness function. The above procedure will be repeated until the number of offspring in the final population is the same as that equal to the number of parents in the initial population. Two genetic operators are used in these processes: crossover and mutation. In this study, double point crossover and Gaussian mutation operators were used with a crossover and mutation probability of 0.01. Figure 4 shows the general process chart of the genetic algorithm. The necessary steps of the GA are outlined below:

1.
Start: Generate chromosomes by random population.

2.
Fitness: Determine the fitness function in the populations of every chromosome.

3.
New Population: Develop the new population by following the steps that follow until completing the new population.
(a) Selection: Based on their fitness, identify two parent chromosomes from a population.

4.
Replace: For the further running of the algorithm, use the newly created population.

5.
Test: When the ended conditions are encountered, the current population's best outcome will stop and return. 6.
Loop: Switch back to Step 2.

Hybrid Genetic Algorithm-Artificial Neural Network (GA-ANN)
In this research, the Hybrid GA-ANN model was developed by incorporating the ANN into a single topology coupled with the GA. The single ANN model suffers from certain drawbacks, such as getting trapped through local minima and slow learning rates. Therefore, optimization algorithms such as the GA with ANN can significantly improve ANN efficiency [73][74][75] over the aforementioned weaknesses. The integrated GA-ANN strategy fulfills the goal based on two steps:

Hybrid Genetic Algorithm-Artificial Neural Network (GA-ANN)
In this research, the Hybrid GA-ANN model was developed by incorporating the ANN into a single topology coupled with the GA. The single ANN model suffers from certain drawbacks, such as Sustainability 2020, 12, 8932 7 of 24 getting trapped through local minima and slow learning rates. Therefore, optimization algorithms such as the GA with ANN can significantly improve ANN efficiency [73][74][75] over the aforementioned weaknesses. The integrated GA-ANN strategy fulfills the goal based on two steps:

•
The GA technique is used to improve the topology of the ANN and its variables.

•
The optimal response is obtained using ANN.
In this study, the GA method was chosen to maximize the optimal number of hidden neurons, weights, and bias values for the ANN models. The GA variables, like crossover likelihood, selection method, mutation rate, size of the population, and the generation numbers, were calculated based on a hit and trial procedure; the details of GA parameters are summarized in Table 1. The flow diagram of the proposed hybrid GA-ANN technique is depicted in Figure 5.

Determination of the Parameters the ANN Model
Architecture: Sets the relationship between a series of inputs and the desired outputs. The ANN with a single hidden node structure has been used to forecast the seasonal GWTD in the study area.
Training algorithm: Training is a process in which iterative modification and optimization techniques are adjusted to update the ANN model parameters such as connection weights and bias

Determination of the Parameters the ANN Model
Architecture: Sets the relationship between a series of inputs and the desired outputs. The ANN with a single hidden node structure has been used to forecast the seasonal GWTD in the study area.
Training algorithm: Training is a process in which iterative modification and optimization techniques are adjusted to update the ANN model parameters such as connection weights and bias values. After several iterations, the training will stop or converge to a specified minimum error rate. In this study, GA optimization techniques were employed to reduce the error between the target and the predictors.
Activation function: Used to convert the input signal to output. In this study, a linear transfer function in the output layer and a logistic sigmoid transfer function in the hidden node were used for ANN models. The functional limitations of the sigmoid logistic factor range between 0 and 1.
Learning rate: The trained mechanism's efficiency is highly vulnerable to the selection of the learning rate. A non-conventional GA optimization method was used to evaluate the favorable learning rate.
Hidden neuron optimization: In general, a hit and trial procedure was utilized to determine the neurons' best-hidden numbers. Few researchers showed the utility of GA for optimizing hidden neurons [76,77]. Hence, in this study, the GA optimization method was utilized to calculate the hidden neurons' optimum numbers.
Error function: Denoted by E, the means square error used for the optimization of the weights and described by Equation (1): where a i is the actual value, p i is the predicted value, and n is the number of observations. Weight optimization: In this study, the learning of error correction was used to develop a channel to attain favorable connection weights by reducing the risk of error between the network's actual performance of a neuron and the response targeted from that neuron. The initial range of weights chosen was from 0 to 1 and was then prioritized using the GA technique.

Development of GA-ANN and GA Models for GWTD Prediction
In this research, a total of 18 models (9 for pre-monsoon and 9 for post-monsoon) were developed with different input parameters (groundwater discharge, groundwater recharge, and antecedent water table depth), as listed in Table 2, for predicting the seasonal GWTD in the study region. The total Sustainability 2020, 12, 8932 9 of 24 available data were separated into two classes: (i) training data included from 1994 to 2008 (70%), and (ii) testing data obtained from 2009 to 2014 (30%). In both the seasons, the number of observations varied from 548 to 570 for the training, and 206 to 228 for the testing. Figure 6 illustrates the length of data utilized for the training and testing of the GA and GA-ANN models. The entire ANN and GA modeling exercises were carried out using the MATLAB R2013a software.

Statistical Indicators
The predictive efficacy of the formulated GA-ANN and GA models was evaluated based on eight statistical measures: coefficient of determination (R 2 ), coefficient of efficiency (CE), correlation coefficient (r), mean absolute deviation (MAD), root mean square error (RMSE), coefficient of variation of error residuals (CVRE), absolute prediction error (APE) and performance index (PI). These R 2 , CE, r, MAD, RMSE, CVRE, APE, and PI, were computed using Equations (5)-(12) as given by [10,81]: where a i is the actual value of GWTD, p i is the predicted value of GWTD, n is the number of observations, a avg is the average of actual GWTD values, and P avg is the average of predicted GWTD values.

Prediction of GWTD Using Traditional GA Method
Firstly, the GA technique was optimized by computing the minimum value of the root mean square error (RMSE) to build the GWTD prediction models. The generation limit values, population size, and the number of binary variables with their lower and upper limits were established by the number of variables in the models. Table 3 shows the GA parameters' values for the nine pre-monsoon season and nine post-monsoon season models, respectively. It was noted from Table 3 that the minimum RMSE was 2.26 for GA-5 with a population size and generation limit of 150 and 200, respectively, for the pre-monsoon season, while the minimum RMSE was 2.73 for post-monsoon season GA-8, with a population size of 100 and generation limit of 150. Table 4 displays the values of performance or statistical indicators in the pre-monsoon season. The values of performance indicators for the GA-5 model were found to be better during the pre-monsoon season. For this model, in the testing period, the maximum values of coefficient of determination (R 2 ), coefficient of efficiency (CE), and correlation coefficient (r) were 0.42, 0.33, and 0.65, respectively, while the minimum values of mean absolute deviation (MAD), root mean square error (RMSE), coefficient of variation of error residuals (CVRE), absolute prediction error (APE), and performance index (PI) were 2.14, 5.11, 0.43, 0.23, and 0.03, respectively. Therefore, GA-5 was selected to forecast the pre-monsoon GWTD in the study area. For the testing data set, the observed and predicted GWTD values by GA-1 to GA-9 models during the pre-monsoon season are illustrated in Figure 7, which shows that the predicted values of GWTD in the pre-monsoon season were not in reasonable consistency with the observed GWTD values. From the 228 (38 × 6 (nodes × years)) expected values of GWTD in the pre-monsoon season, only 113 values were ensured a 10 % variation during the testing period.   Similarly, the values of performance indicators during the post-monsoon season are given in Table 5. In the post-monsoon season, the GA-8 model produced the highest values of R 2 , CE, and r at 0.47, 0.68, and 0.31, respectively, while the values of MAD, MSE, CVRE, APE, and PI were the lowest at 1.87, 7.45, 0.47, 0.22 and 0.03, respectively. The GA-8 model was elected best to predict GWTD for the post-monsoon season in the study area. Figure 8 demonstrates the comparison among the observed and predicted values of GWTD in the post-monsoon season with the testing dataset. It can be seen in Figure 8 that the GA-8 model has less scattering than the other models. Based on the assessment, it can be concluded that the GA model had the potential ability to recognize the trend of groundwater table depth data during both seasons. However, GA models were not able to predict the GWTD accurately in the study region during both seasons.

Prediction of GWTD Using GA-ANN Models
The training and testing results based on the effect of population size on the mean square error (MSE) for the post-monsoon and pre-monsoon seasons of all GA-ANN models are listed in Table 6. It was observed in Table 6 that the minimum MSE values for different GA-ANN models were

Prediction of GWTD Using GA-ANN Models
The training and testing results based on the effect of population size on the mean square error (MSE) for the post-monsoon and pre-monsoon seasons of all GA-ANN models are listed in Table 6. It was observed in Table 6 that the minimum MSE values for different GA-ANN models were obtained from a population size of 50 as compared with the MSE from the population sizes of 100 and 200 for all models. Hence, the population size of 50 was selected as optimal for GA-ANN development to predict seasonal GWTD in the study region. The optimal number of generations, optimal population size, and respective MSE values for all GA-ANN models for pre-and post-monsoon seasons are summarized in Table 7. In a single-layered ANN structure, the number of neurons was enhanced for all the developed GA-ANN models using MATLAB R2013a software. The methodology of optimizing the GA was used to maximize the number of neurons per model. The optimal numbers of neurons for each GA-ANN model corresponding to the minimum mean square error (MSE) are given in Table 8. Table 6. Effect of population size on GA-ANN models for pre-and post-monsoon seasons.

Model Data Set
Population Size = 50 Population Size = 100 Population Size = 200  Table 7. Optimal population and generation for developed GA-ANN models during pre-and post-monsoon seasons.  Table 8. Structure of different developed GA-ANN models. Finally, the value of performance indicators of hybrid GA-ANN models for pre-monsoon season during training and testing periods are listed in Table 9, which indicates that the performance of GA-ANN-8 was better than other GA-ANN models. The values of R 2 , CE, and r for the GA-ANN-8 were found to be 0.91, 0.91, and 0.96, respectively, in the training period. In testing, the values of these variables were 0.94, 0.94, and 0.97, respectively. The values of MAD, RMSE, CVRE, APE, and PI were 0.45, 0.22, 0.12, 0.01, and 0.03, respectively, during the training period, while in the testing period were 0.48, 0.17, 0.11, 0.03, and 0.02, respectively. The GA-ANN-8 model was chosen as the best to predict the pre-monsoon GWTD in the study area. The observed and predicted values of GWTD in pre-monsoon by the GA-ANN models for the testing dataset are plotted in Figure 9. It was noted from Figure 9 that the expected values of pre-monsoon season GWTD were in better agreement with the measured (observed) values of GWTD during the testing period.  Similarly, for the post-monsoon season, the performance indicator values of hybrid GA-ANN models are summarized in Table 10 for both the periods and found that the GA-ANN-8 performed significantly better than other GA-ANN models. The values of R 2 , CE, and r for the GA-ANN-8 were obtained as 0.89, 0.90, and 0.94, respectively, during the training period and 0.95, 0.96, and 0.97, respectively, during the testing period. While the values of MAD, RMSE, CVRE, APE, and PI for GA-ANN-8 were found to be 0.56, 0.31, 0.15, 0.11, and 0.03, respectively, in the training, and 0.45, 0.42, 0.13, 0.10, and 0.01, respectively, in the testing. The observed and predicted GWTD values yielded by GA-ANN-1 to GA-ANN-9 models for post-monsoon throughout the testing period are illustrated in Figure 10. It was found that the expected value of GWTD in post-monsoon had a better association with the observed values of GWTD in the testing period. The reason for the better performance of the GA-ANN-8 model may be the development of the model using annual data from three years, including the values of previous groundwater table depth for both seasons. However, in the GA-ANN-1 and GA-ANN-5 models, only annual data from one and two years were used. Therefore, the GA-ANN-8 model was nominated as the best model to predict the post-monsoon GWTD in the study area. Similarly, for the post-monsoon season, the performance indicator values of hybrid GA-ANN models are summarized in Table 10 for both the periods and found that the GA-ANN-8 performed significantly better than other GA-ANN models. The values of R 2 , CE, and r for the GA-ANN-8 were obtained as 0.89, 0.90, and 0.94, respectively, during the training period and 0.95, 0.96, and 0.97, respectively, during the testing period. While the values of MAD, RMSE, CVRE, APE, and PI for GA-ANN-8 were found to be 0.56, 0.31, 0.15, 0.11, and 0.03, respectively, in the training, and 0.45, 0.42, 0.13, 0.10, and 0.01, respectively, in the testing. The observed and predicted GWTD values yielded by GA-ANN-1 to GA-ANN-9 models for post-monsoon throughout the testing period are illustrated in Figure 10. It was found that the expected value of GWTD in post-monsoon had a better association with the observed values of GWTD in the testing period. The reason for the better performance of the GA-ANN-8 model may be the development of the model using annual data from three years, including the values of previous groundwater table depth for both seasons. However, in the GA-ANN-1 and GA-ANN-5 models, only annual data from one and two years were used. Therefore, the GA-ANN-8 model was nominated as the best model to predict the post-monsoon GWTD in the study area.   This study's outcomes follow the studies carried out in other parts of the world to predict the groundwater level with slightly different input parameters [44,62,65,68,69,82,83], and found the performance of GAs implemented with ANN promising for the prediction of groundwater table depth in various regions. Shiri et al. [84] predicted groundwater depth (GWD) fluctuations of two coastal aquifers located in Donghae City, Korea, by employing six heuristic models: boosted regression tree (BRT), random forests (RF), multivariate adaptive regression spline (MARS), ANN, support vector machine (SVM), and gene expression programming (GEP). They found the GEP model with tide and rainfall data provided better estimates than the other models. Some findings also showed the potential capability of genetic algorithm in conjunction with other machine learning techniques in various water resources problems [85][86][87].

Pre-Monsoon
This study's overall findings revealed that the hybrid GA-ANN models performed well in seasonal groundwater table prediction with varying input variables in the study area. These models were more reliable, robust, dynamic, and time-saving than the simple one. This study would help the hydrologists and geologists formulate a smart, intelligent system for effective planning and management of groundwater resources for operating the various drives in the study region. Thus, this study proved the feasibility of the hybrid GA-ANN model in predicting the seasonal GWTD in the area between the Ganga and the Hindon rivers in Uttar Pradesh.

Conclusions
With climate change and overexploitation situations, groundwater table fluctuations' accurate predictions are essential for managing groundwater resources. The present study aimed to investigate the comparative potential of the hybrid GA-ANN models against the traditional GA models to predict the seasonal groundwater table depth in the area between the Ganga and the Hindon rivers. The ability of developed models was evaluated by using the statistical indicators (coefficient of determination, coefficient of efficiency, correlation coefficient, mean absolute deviation, root mean square error, coefficient of variation of error residuals, absolute prediction error, and performance index), as well as through visual inspection. The analysis results demonstrate that the GA models recognized the groundwater table depth trend efficiently but failed to predict the groundwater table depth because the maximum coefficient of determination was only 0.47. Simultaneously, the GA-ANN models' performance was found to be superior to the GA models for GWTD prediction in both the seasons, with the highest coefficient of determination values of 0.94 and 0.95, respectively. It was also concluded that the more significant number of input parameters enhanced the predictive rationality of applied GA-ANN models. Thus, the GA-ANN based models may be successfully functional in the field of groundwater to predict the groundwater table fluctuations with reasonably good accuracy.
The efficient models found in this study confirm promising outcomes and proved to be reliable and time-saving technologies for optimal planning and management of groundwater resources in the study area. Our proposed model could be readily transferable or adapted to other areas, specifically those with similar hydrogeological conditions. The accessibility and quantity of data are challenging. In future research, the authors will project to establish a wireless sensor network for near real-time monitoring of groundwater levels and meteorological data in the study area.
Funding: This research received no external funding.