Exploring Artificial Intelligence Techniques for Groundwater Quality Assessment

: Freshwater quality and quantity are some of the fundamental requirements for sustaining human life and civilization. The Water Quality Index is the most extensively used parameter for determining water quality worldwide. However, the traditional approach for the calculation of the WQI is often complex and time consuming since it requires handling large data sets and involves the calculation of several subindices. We investigated the performance of artificial intelligence techniques, including particle swarm optimization (PSO), a naive Bayes classifier (NBC), and a support vector machine (SVM), for predicting the water quality index. We used an SVM and NBC for prediction, in conjunction with PSO for optimization. To validate the obtained results, groundwater water quality parameters and their corresponding water quality indices were found for water collected from the Pindrawan tank area in Chhattisgarh, India. Our results show that PSO–NBC provided a 92.8% prediction accuracy of the WQI indices, whereas the PSO–SVM accuracy was 77.60%. The study’s outcomes further suggest that ensemble machine learning (ML) algorithms can be used to estimate and predict the Water Quality Index with significant accuracy. Thus, the proposed framework can be directly used for the prediction of the WQI using the measured field parameters while saving significant time and effort.


Introduction
A high enough quantity and appropriate quality of freshwater are some of the fundamental requirements for sustaining human life and civilization.Indeed, the tremendous population growth and miraculous achievements in science and technology have increased groundwater utilization for domestic, industrial, and irrigation purposes multiple folds throughout the world over the last few decades.Rapid urbanization, overexploitation, and unscientific waste disposal have also influenced the accessibility and quality of groundwater.Excessive population growth and rapid urbanization have forced the use of chemicals and pesticides for agricultural purposes, which often results in leaching and mixing into the groundwater.As indicated by the World Health Organization (WHO), inappropriate or polluted water causes around 80% of all diseases in human beings.Furthermore, contaminated groundwater quality cannot be improved or re-established by preventing contamination from the source.Therefore, understanding and determining water quality is imperative in the study of water resources and environmental engineering.
Water quality essentially determines the usability of water from a source in terms of the nature and concentration of the impurities present in the sample [1].As a combined effect of the continuous deterioration in water quality and quantity, approximately one billion people worldwide face a shortage of adequate and safe water supply.These statistics' increasing nature makes it essential to monitor water quality for its efficient management and supply [2,3].
The most efficient method for classifying water quality is using the Water Quality Index (WQI).Water quality is often estimated based on water quality indices [4,5].It is a tool that has been extensively utilized to assess the performance of water quality management approaches [6].The approach and methodology used for calculating and interpreting water quality indices have evolved over the years [7][8][9][10][11].The estimated values of water quality indices have been used to indicate water samples' suitability for day-to-day use.They can be utilized effectively in the execution of water quality overhauling programs.
The WQI's variables comprise biological oxygen demand (BOD), temperature, dissolved oxygen (DO), total suspended solids (TSSs), ammoniacal nitrogen (AN), chemical oxygen demand (COD), and pH [12].Groundwater quality indices (GQIs) are usually forecasted by measuring the standard variables, such as magnesium (Mg 2+ ), calcium (Ca 2+ ), and nitrate (NO −3 ) [13][14][15].The value provided by the WQI is significant enough to help decision makers.However, estimating the WQI is not that simple because subindex calculations are done in the WQI equations themselves.Several methods are available in the literature for the computation of the WQI worldwide, e.g., United States National Sanitation Foundation Water Quality Index (NSFWQI), the British Columbia Water Quality Index (BCWQI), and the Canadian Water Quality Index (CWQI).
The WQI aims to convert the complicated water quality information into straightforward data that is readily useable by researchers and conveyable to people in general.The calculation process in the case of some approaches applied in several countries, including India [6,16], can be exceptionally intricate and time consuming.As a result, the process always contains the risk of attracting unintended miscalculations [17].Thus, the limitations for the calculations of WQI are the following: (a) time consuming, (b) lengthy process, (c) complicated process, and (d) different equations are used for WQI calculations, hence there are inconsistencies.It may be obvious from the above discussion that no standard method is available for the WQI.
To conquer the above problems, a few scientists have proposed a nonphysical approach that can successfully predict WQI using machine learning (ML) and artificial intelligence (AI) [18][19][20].After satisfactory training, an AI-based model can promptly produce a WQI value by eliminating the sub-index calculations.Awareness of AI algorithms is increasing due to benefits that include nonlinear structures, the capability to calculate complicated trends, the capability to manage huge datasets consisting of different data scales, and insensitivity to absent data.The forecasting capability of ML-AI algorithms greatly relies on the procedures and exactness of the data collection and analysis.The continuous evolution of computational ability has allowed researchers to use diverse arrangements of ML-AI models.Approaches such as artificial neural networks (ANNs) [17,[21][22][23][24][25][26] adaptive neuro-fuzzy inference systems [27][28][29][30][31], and support vector machines (SVMs) [32] have been effectively applied to predict the quality of water worldwide.Abba et al. (2020) [33] describe in detail the ML-AI techniques that are used for WQI measurement.Most of these ML-AI algorithms can perform with a certain degree of accuracy and it is challenging to compare them based on their performance [25,34].
The AI techniques used in the present study, sometimes include complex manual implementation to reduce its actual effectiveness for water quality management personnel.Practitioners have a great interest in learning the codes such that the codes can be used for solving complex models like the one discussed above.A comprehensive comparison of such models' applications with required software packages must be carried out to improve the accuracy of predictions and the suitability of the AI-based models.However, various data mining programs do not involve vast manipulation of several AI models; instead, the majority of them just support fundamental methods without optimization.
Our study also aimed to develop a user-friendly interface in MATLAB for practitioners that do not have a programming background.The recommended interface is based on a nature-inspired metaheuristic classification system that integrates particle swarm optimization (PSO), along with an SVM and NBC.The water quality was forecasted using fundamental AI techniques, which involved a particle swarm optimization (PSO) algorithm combined with support vector machines (SVMs) for prediction.The classification and predictive AI system investigated in the study was developed using four AI models (single), hybrid metaheuristic regression, and four ensembles (i.e., stacking, voting, bagging, and tiering).The baseline models encompassed single models by using two AI techniques: SVM and NBC, respectively.Subsequently, the ensemble models integrated the registered single models and utilized voting, bagging, tiering, and stacking methods.The goal of the present work was to propose a framework for flexible water quality modeling.The analytical technique had similar goals: the models' predictive accuracy and applicability.The framework will empower administrators and hydrologists to choose the best analytical tools for water management using AI techniques.
These models should be selected based on specific requirements.However, sometimes applying an ensemble model can significantly enhance the model accuracy and reduce the computational cost.In the present study, the combination of the PSO algorithm's applicability with an SVM and NBC was exploited.A framework was proposed for predicting the WQI in the Pindrawan tank area, Raipur region, Chhattisgarh, India.

Study Area
The Pindrawan tank command area was the area under study (Figure 1); it is situated within 81°45′-81°50′ E and 21°20′-21°25′ N in the upper Mahanadi River valley (southeastern part) and Raipur district of Chhattisgarh, India.A total of nine villages, namely, Pauni, Amlitalab, Khauna, Deogaon, Bangoli, Dhansuli, Kurra, Baraonda, and Nilja, come under the study area, which has a tropical wet and dry climate.The temperature in this part of India remains moderate throughout the year.The highest temperatures in the year are observed from March to June.

Data Collection and Water Quality Estimation
The groundwater samples were collected in 2018 during the pre-monsoon period from hand pumps and bore wells (37 sites), which are extensively utilized for drinking in the Pindrawan tank area.The identification of the sampling points was performed using topographic sheets and GPS, and the maps were prepared using ArcGIS 10.1 (ESRI, California USA).Topographic sheets were utilized to prepare the base map and recognize the general features of the area.GPS techniques were used to identify the geographic position of each sampling point.The collected groundwater samples were investigated for the concentration of different parameters, namely, electrical conductivity (EC), pH, total dissolved solids (TDSs), total hardness (TH), alkalinity, bicarbonate (HCO3 − ), chloride (Cl − ), sulfate (SO4 2− ), nitrate (NO3 − ), fluoride (F − ), calcium (Ca 2+ ), magnesium (Mg 2+ ), sodium (Na + ), potassium (K + ), iron (F − ), and chromium (Cr 2+ ), per the specification of the Federation and American Public Health Association (2005).The EC and pH of the collected samples were measured using an EC and pH meter on the field.Fluoride concentrations were analyzed based on the selective electrode method.TH, chloride, and alkalinity were measured using titrimetric methods.Heavy metals were measured using an atomic absorption spectrum and prescribed safety measures were considered to avoid contamination.
The locations of the sampling stations are presented in Figure 1.The concentrations of the parameters were compared with the acceptable limits prescribed by BIS (2012) [35].The permissible limits of potassium, bicarbonate, and sodium are reported in [36,37].
The WQI of the collected samples was calculated using the weighted arithmetic Water Quality Index (WQI) method [38][39][40].The weights (Wi) that were assigned to each parameter according to their impact on the water quality are shown in Table 1.Unfit for drinking

Utilization of AI for the Prediction of the WQI
The present study utilized two powerful machine learning approaches for the estimation of the WQI classes by considering the parameter (variables) values as inputs.All 16 variables resembled a variable vector.The analysis was carried out using 1250 variable vectors (250 for each class), which were generated using PSO to contain the whole array of every class.Calibration was conducted using 1250 variable vectors (250 from each class) by applying tenfold cross-validation, and the assessment was done using 250 variable vectors (50 from every class).

Classification and Prediction Using a PSO-SVM Approach Based on the Water Quality Index
The PSO approach is an extremely powerful algorithm that can optimize different model parameters depending on a population's behavior.The approach was proposed by Eberhart and Kennedy in 1995 [42].The PSO approach has been efficiently used to solve a multitude of nonlinear problems in diversified fields, such as geology [43,44] , landslide analysis [45,46], forest fire mapping [47], and flood modeling [48,49].The algorithm is initialized with a population of arbitrarily selected solutions between the maximum and minimum range of the parameters.Several advantages of the PSO approach, including the ease of implementation and convergence, fewer parameters, and the use of parallel computing, makes this approach a more comfortable choice compared to other available optimization techniques.The algorithm was developed based on the conduct of a group of fish or birds selecting the smallest path to a food source [50].The algorithm can improve the exchange of information between samples in a population through an interactive learning process that helps the population arrive at a consistent solution.Each solution is considered as "bird", also known as "particle", in the solution space.Such interactions between members of the population allow this algorithm to demonstrate a robust search proficiency and advanced adaptability to various problems.In PSO, particles (solutions) will be collected randomly, and then the best particles will be found by renewing the generation.In each generation, each particle is modified using the next two "best" parameters.The first is the best value based on fitness that has been obtained by it until now (fitness parameters are also stored).This value is called individual best value (pbest).Pbest is the best value of thepartile among all the values obtained so far.The other "best" parameter, which comes from the particle swarm analyzer, is the best value that ha been obtained by any particle in the current population.This highest value is called global best (gbest).The movement of the particles is controlled by these optimal values of pbest and gbest.After finding an improved position, they will continue to control the movement of the flock.In the solution space [51], a particle is comprised primarily of two vectors, namely, velocity (Vi) and position (Xi) [52], by using Equations ( 1) and ( 2) respectively.Figure 2 gives the PSO algorithm that is used for the particle optimization.The optimization of these two vectors in the d th dimension is performed through the following equations: where, w is known as the inertia weight.The value of these parameters specifies the number of particles following the current velocity  The coordinate attained by every individual particle in the solution space is recorded by the algorithm.These coordinates are representations of the best solution (fitness value) that has been attained by the particle and is called the local optimum (pbest), whereas the best solution attained by any particle in the vicinity of a specific particle is known as the global optimum (gbest).Although, the particles in the PSO approach tend to move arbitrarily, the best achieved position of the particles (pbest) and the group's best position (gbest) have significant influence over their movement.
Presently, the PSO approach was utilized to produce the optimized values of the WQI, along with all of the 16 water quality variables by considering the variables' lower and upper limits, as presented in Table 3.Based on the corresponding WQI values, the groundwater quality for drinking purposes was classified into five categories (Table 2).To achieve the optimized values of the WQI and water quality variables corresponding to the different classes of water quality, the WQI parameter was considered as the fitness function.The algorithm was set up with an initial population of 50 and processed up to a maximum generation of 500; therefore, a total of 50 × 500 = 25,000 optimized values were generated.The ranges of values for each variable used in the WQI function are presented in Table 3.The procedure for generating the optimal variables' values was as follows: Step 1-The fitness function was explained using the WQI function, initializing "50 as population" and "500 as the maximum generation." Step 2-Each variable's maximum and minimum limits were set while using the WQI function according to Table 3.
Step 3-Every particle's movements were recorded in every generation in the vector form comprising the value of the WQI, together with the subsequent values of the 16 variables.
Step 4-The category (class) of each variable vector was obtained by considering its corresponding WQI, as presented in Table 2.
Step 5-A total of 250 variables vectors were selected from each category in such a manner that the entire range of the particular category should be covered, as given in Table 2.
In every generation, the populace shifted from the initial position to a new appropriate place and produced new fitness values.Every particle's movement in every generation was recorded in the vector form containing the WQI value along with the subsequent variables' values.Every random particle updated its fitness value (WQI) in each generation, which was stored in the database and related variables.In PSO, the population's values (swarm) and max iteration (generation) depend on the user.The flowchart for this work is shown in Figure 3.The classification of the WQI values was performed using a support vector machine and a naive Bayes classifier.Before proceeding with the classification, the dataset was normalized between 0 and 1 to enhance the accuracy.The variables' values in vector format were treated as a feature vector in the normalized dataset.

Classification Using a Support Vector Machine
The SVM classifier [54] plays an essential and comprehensive role in classification due to its high accuracy and ability to deal with high-dimensional data.The simple form of the classification is the binary used for separating two types of objects belonging to positive (+1) and negative (−1) classes.A support vector machine uses two kinds of concepts to distinguish between two classes: (1) separation from the margin and (2) the kernel function.
The simple two-dimensional data can be classified by using a straight line.The points that fall above the line belong to one class, and the points that fall below the line belong to another class.The high-dimensional data can be classified by using the hyperplanes.However, in a binary classification, multiple planes can be drawn such that they separate the data into two classes.As such, which plane will be selected for the classification?In this case, the hyperplane that gives the maximum margin will be selected for classification.Therefore, we choose the hyperplane such that the distance from it to the nearest data point on each side is maximized.The classification of the data with the best margin hyperplane is shown in Figure 4.In Figure 4, there are two types of data points: filled and unfilled dots.Three planes exist, which are named H1, H2, and H3.H1 does not successfully classify the data points.Planes H2 and H3 are both capable of classifying data points, but H2 gives a smaller margin than plane H3.This is why plane H3 is selected for the classification.Sometimes the data is not classified by hyperplanes because of its distribution in a vast space.In that case, we use a nonlinear separation for the classification.The SVM classifier can efficiently perform this nonlinear classification by using kernel functions.The nonlinear classification is presented in Figure 5.In Figure 5, there are two types of objects, as identified by the solid and hollow dots.The objects represented in this figure cannot be separated using a linear hyperplane; the support vector machine performs this task using kernel functions.The kernel function separates the data in the feature space by using a linear hyperplane.In this work, the SVM classifier separates the individual water quality classes with hyperplanes by using the radial basis kernel (Gaussian) function [55][56][57][58].The distance of a feature vector from the hyperplanes determines its probability of featuring in a specific class.The normalized dataset and the class labels were used as inputs in the present study.The dataset was randomly divided 80:20, where 80% of the dataset was used for training purposes using tenfold cross-validation.In the tenfold cross-validation, the entire dataset was divided randomly into ten equal-sized subsamples.A single subsample was used for testing purposes, and nine subsamples were used for training purposes on ten subsamples.This process was repeated ten times until each of the 10 subsamples were used exactly once for testing purposes.The remaining 20% of the dataset was used for testing and validation purposes.

Classification Using Naive Bayes Classifier
Naive Bayes classifiers are based on Bayes Theorem with a family of algorithms with the same principle, i.e., each pair of features being categorized is independent of every other.The fundamental naive Bayes assumption is that every feature makes an unbiased and identical contribution to the outcome.A naive Bayes classifier is a probabilistic machine learning model that is used for a classification task.The crux of the classifier is based on Bayes' theorem: By using Equation ( 3), the probability of event A happening can be measured by considering that event B has occurred.Here A is the hypothesis and B is the evidence.One assumption that is considered here is that all features are independent/autonomous, which means the presence of one particular feature does not affect the other.Hence it is called naive.Before the PSO-NBC analysis, the dataset was normalized to enhance the performance of the model.A total of 80% of the dataset was used to train the algorithm, whereas 20% of the dataset was used to study the algorithm's prediction accuracy.In this work, continuous values that were associated with each feature were assumed to be distributed according to a Gaussian/normal distribution.

Water Quality Index (WQI) Analysis of the Field-Based Samples
The concentration, distribution, and impact of different physicochemical parameters observed from water samples collected from the Pindarwan tank area are discussed in this section.The ranges of concentrations observed for various parameters and the percentages of total samples exceeding the prescribed limit are presented in Table 3, along with their undesirable effect on groundwater quality and human physiology.This section provides an overview of the spatial distribution of the physicochemical parameters that were measured in the Pindarwan tank area; a more detailed description is provided in Figures A1-A15 in the Appendix A. Out of 37 samples, 32.43% of the samples had excellent water quality, 43.24% of the samples had good water quality, 21.62% of the samples had poor water quality, and 2.71% of the samples had very poor water quality.This may be due to the heavy concentrations of metals, such as Pb and Cr, due to nearby industries, which involve mining activities, thermal power plants, etc.The areas corresponding to these WQI values are presented in Figure 6.
Figure 7 represents a correlation plot between the WQI and the parameters observed from the study area's water samples.The correlation between the independent parameters can be neglected in the plot since these plots are mostly empirically based on specific values.In decreasing order, the influence of different parameters can be presented as chro-mium, sodium, fluoride, potassium, chloride, conductivity, total dissolved solids, alkalinity, bicarbonate, and pH.Contributions from the rest of the parameters on the overall water quality were much less compared to these parameters.Through observing Figure 7, it can be concluded that water quality for drinking was susceptible to heavy metal concentrations, such as chromium.Based on the WQI, the sample area's drinking water quality was divided into four categories.No sample was observed to be unsuitable for drinking based on the analysis.Very poor water quality was observed from the Raikheda pond area due to a very high chromium concentration.Poor water quality was observed in significant parts of the Deogaon, Dhansuli, Bangoli, Amlitalab, and Khauna villages.Most areas of all the villages had good water quality.Excellent water quality was observed in Saragaon, Nilja, Dhansuli, Bangoli, Khauna, Baronda, and Pauniarea.The observed water qualities may suggest that most of the study area's water quality is satisfactory and there is no immediate danger for the population.However, the values of certain parameters, such as the chromium concentration, total hardness, and total dissolved solids, were alarmingly high for many areas and could become worse.This may significantly influence the present scenario of the water quality in the study area under consideration.Therefore, concerned authorities should note the situation and plan proper steps for maintaining or improving the current situation of the drinking water quality in the study area.
Furthermore, the averages and ranges of the values of different parameters corresponding to water quality are presented in a boxplot format in Figure 8a-p.The concentration of some parameters such as alkalinity, chloride, conductivity, chromium, iron, bicarbonate, sodium, and TDSs are found to be directly proportional and has much more significant impact on the WQI of the study area.These are, therefore, the parameters that have to be first taken care of when aiming to improve the water quality for the specific study area.The influences presented in Figure 8a-p are the combined effect of the concentration of each parameter and the relative weight of each parameter.Therefore, even if a parameter's relative weight is much less, it could make a significant impact if it had a very high concentration.However, these plots are strictly applicable to the present study area and no inference should be derived from these plots for any other samples.The boxplots and correlation plots can be extremely useful for conveying a detailed picture regarding the water quality of the study area and the influence of different parameters on the water quality.

Result from the PSO-SVM Study
The performance of the model is presented using the confusion matrix in Figure 9a.The confusion matrix is used to explain the model's classification and overall performance on the testing datasets whose original labels are known.The instances in a predicted class and actual class are represented in every row and each column respectively (or vice versa).In Figure 9a, the rows from the top to the bottom correspond to the excellent, good, poor, very poor, and unfit for drinking water qualities, respectively, as predicted using the SVM classifier.
Furthermore, the columns from left to right follow a similar arrangement of the target class (actual classifications based on the WQI values).Each column related to these classes had 50 variable vectors (water quality class from excellent to unfit for drinking), totaling 250 variable vectors.In the first row, 50 variable vectors are presented, indicating 50 excellent water class WQIs, where the system predicted them all as being in the excellent category.Similarly, in the second, third, fourth, and fifth rows, a sum of 61, 54, 69, and 16 variable vectors are presented, respectively.The result indicates that the algorithm predicted 61 samples as good quality, 54 as poor quality, 69 as very poor quality, and 16 as unfit for the drinking category.The prediction accuracies corresponding to each class are also presented in the last column from the left-hand side.The overall accuracy of the algorithm was found to be 77.60%.Furthermore, a difference between the classifications based on actual values of the WQI and the predicted classification based on the SVM classifier is presented in Figure 9b.

Discussion of the PSO-NBC Approach
The PSO-NBC study was carried out by considering the same dataset as in the PSO-SVM approach.The test accuracy is discussed using the confusion matrix presented in Figure 10a.The rows and columns marked as 1 to 5 indicate the excellent (1), good (2), poor (3), very poor (4), and unfit for drinking (5) water qualities.The 51 variable vectors in the first row indicate that the algorithm identified 51 variable vectors as excellent water quality when there were 50 actual excellent water categories (1 more due to misclassification).Similarly, in the second (50 variable vectors of good water quality), third (50 variable vectors of poor water quality), fourth (50 variable vectors of very poor water quality), and fifth rows (50 variable vectors of unfit for drinking water quality), the algorithm placed 57 (good water quality), 46 (poor water quality), 51 (very poor water quality), and 45 (unfit water quality) variable vectors.The prediction accuracy of the algorithm corresponding to each class is presented in the sixth column.The total accuracy of the algorithm was observed to be 92.80%.
The comparisons of the model-predicted outcomes against the actual WQI values are graphically represented in Figure 10b.

Comparison between the PSO-SVM and PSO-NBC Approaches
The performances of the PSO-SVM and PSO-NBC approaches used in the present study are presented in Figure 11.The figure indicates that the PSO-SVM algorithm predicted some classes (excellent and poor water categories) with significant accuracy; however, significant deviations were observed in the model's performance for the other categories.On the other hand, the prediction accuracies of PSO-NBC were much higher for all the classes and did not distinctly deviate for any specific categories.Therefore, a naive Bayes classifier aided by particle swarm optimization can be efficiently used to construct a machine learning model to classify water for drinking purposes.

Conclusions
The process of WQI estimation is often associated with handling large quantities of identical data.This can create significant confusion during the calculation process and make decision making difficult.A machine-learning-based predictive model can assemble the necessary information and predict the groundwater quality with significant accuracies.This study aimed to utilize modern machine learning techniques for the prediction of water quality for drinking.The groundwater samples collected from parts of the Pindrawan tank command area were used for testing and validation of the developed model.1.The calculated WQI values suggested that 32.43% and 43.24% of the water samples of the study area represented excellent and good water qualities, respectively.Similarly, it can also be observed that 21.62% and 2.71% of the water in the study area were of poor and very poor drinking water qualities.Very poor water quality was observed from the Raikheda pond area due to very high chromium concentration.Poor water quality was observed in significant parts of the Deogaon, Dhansuli, Bangoli, Amlitalab, and Khauna villages.2. The major cation and anion data revealed that all anions were within the limits, except for potassium, where 13% of the samples exceeded the limit.However, the heavy metals pollution in the area due to mining activities could be a cause for concern soon.A total of 48.6% of the samples from the area exceeded the permissible limits of chromium, which can cause conditions such as hearing loss, blood disorders, hypertension, and death at high levels.3. The study further suggests that ensemble machine learning algorithms can be used for the estimation and prediction of a WQI with significant accuracies.In the present study, a particle swarm optimization approach coupled with a naive Bayes classifier provided a 92.8% accurate prediction of the WQI indices.Therefore, with the help of a user interface, this algorithm can be efficiently utilized for the estimation of WQIs, which can save significant effort and time.
The general outcomes from the present research indicate the benefits of using ensemble machine learning techniques, where outcomes from several different algorithms can be combined and used to achieve predictions with enhanced accuracies.Finally, with the help of a user interface, the algorithm developed in the present study can be used for water quality estimation in different regions across the globe.
The classification in the present study was carried out by taking the synthetic dataset that was generated using particle swarm optimization.However, the developed approach can be further improved if more real data is available.Therefore, the authors suggest using a larger field dataset to obtain better accuracy, though this is often a difficult undertaking provided the painstaking process of sample collection and laboratory analysis for all the water quality parameters.The developed algorithm can be further improved by studying its performance and fine-tuning it with different input parameters.
carrying out this research work.A.A. acknowledges the infrastructural support provided by the Indian Institute of Technology Roorkee.

Conflicts of Interest:
The authors declare no conflict of interest.

Figure 1 .
Figure 1.Map of the study area showing the Pindrawan tank command area's geographical location in Chhattisgarh State, India.The figure shows the location of the study area at the country and state levels, as well as the village boundaries that are under the Pindrawan tank command area with drinking water sample locations (green color points).
. The parameters c1 (cognitive coefficient) and c2 (social coefficient) are known as the acceleration factors.The parameters c1 and c2 represent the self-reasoning capability and the ability to acquire information from any particle's contemporary global optimal solution, respectively.r1 and r2 are two independent arbitrary parameter numbers in the range [0, 1] [53].and are known as the local optimum (best-known position value of any particle i) and the global optimum (optimal value obtained by the swarm of all particles).

Figure 2 .
Figure 2. Flowchart for the optimization of the particles.

Figure 3 .
Figure 3. Flowchart describing the workings of the PSO.

Figure 4 .
Figure 4. Classifications of data using various hyperplanes.

Figure 5 .
Figure 5. Use of the kernel function in an SVM.

Figure 7 .
Figure 7. Correlation plot between various groundwater quality parameters.

Figure 9 .
Figure 9.Comparison between the predicted class and target class using the SVM approach: (a) confusion matrix and (b) column plots.

Figure 10 .
Figure 10.Comparison between the predicted class and target class using the NBC approach: (a) confusion matrix and (b) column plots.

Figure 11 .
Figure 11.Comparison of the predicted outcomes using the PSO-SVM and PSO-NBC approaches.
The collected samples were tested for different parameters of water quality and the subsequent values of WQI were computed.Conclusions derived from the present work are as follows:

Figure A16 .
Figure A16.Flowchart of the procedure followed in the study.

Table 1 .
Water quality parameters used when calculating the WQI.Based on the corresponding WQI values, the quality of the groundwater for drinking purposes can be classified into five categories, as presented in Table2.

Table 3 .
Comparison of chemical parameters with prescribed standards.

Table A1 .
A Locations used for the groundwater samples.