Modelling and Prediction of Water Quality by Using Artiﬁcial Intelligence

: Artiﬁcial intelligence methods can remarkably reduce costs for water supply and sanitation systems and help ensure compliance with the quality of drinking and wastewater treatment. Therefore, modelling and predicting water quality to control water pollution has been widely researched. The novelty of the proposed system is presented to develop an efﬁcient operation of monitoring drinking water to ensure a sustainable and friendly green environment. In this work, the adaptive neuro-fuzzy inference system (ANFIS) algorithm was developed to predict the water quality index (WQI). Feed-forward neural network (FFNN) and K-nearest neighbors were applied to classify water quality. The dataset has eight signiﬁcant parameters, but seven parameters were considered to show signiﬁcant values. The proposed methodology was developed based on these statistical parameters. Prediction results demonstrated that the ANFIS model was superior for the prediction of WQI values. Nevertheless, the FFNN algorithm achieved the highest accuracy (100%) for water quality classiﬁcation (WQC). Furthermore, the ANFIS model accurately predicted WQI, and the FFNN model showed superior robustness in classifying the WQC. In addition, the ANFIS model showed accuracy during the testing phase, with a regression coefﬁcient of 96.17% for predicting WQI, and the FFNN model achieved the highest accuracy (100%) for WQC. This proposed method, using advanced artiﬁcial intelligence, can aid in water treatment and management.


Introduction
With fast economic growth and increased urbanization, water pollution has become grimmer. Understanding the issues and patterns of water quality is also critical for water pollution reduction and regulation. Most countries around the world have started to develop environmental water management schemes to truly understand the quality of the marine ecosystem. Water is life's most important substance. Although 71% of the Earth's surface is covered with water, the vast majority of it (95%) is salt water [1]. Thus, conserving the quality of fresh water is essential. Almost one billion people do not have access to adequate drinking water sources, and two million people die every year from contaminated water and poor sanitation and hygiene [2].
Water quality is important to the sustainability of a diversion scheme. Predicting water quality involves forecasting variation patterns in the quality of a water system at a certain time. Water quality prediction is important for water quality preparation and regulation. Strategies for the prevention and regulation of water contamination can be developed by predicting future changes in water safety at varying levels of contamination and devising rational strategies to prevent and regulate water contamination. In water diversion schemes, the general consistency of water should be estimated. A large volume vector regression (SVR) model. Maiti et al. [27] predicted dissolved oxygen (DO) levels using the ANN model. Deep learning methods showed higher performance in predicting WQI compared to traditional machine learning techniques, as did AI techniques such as ANN, Bayesian NNs, and adaptive neuro-fuzzy [28]. Piazza et al. [29] presented a comparison between the proposed model's numerical optimization approach and the results of an experimental campaign. The genetic algorithm with a hydraulic simulator was applied to test and evaluate water quality by monitoring it. Sambito et al. [30] developed a smart system based on the Internet of Things and a Bayesian decision network (BDN) for predicting wastewater. The proposed system was focused on analysis and soluble conservative pollutants such as metals, decision support systems, and auto-regressive moving averages, and was applied to predicting the water quality WQ of groundwater [31].
Currently, water quality is assessed by costly and time-consuming laboratory and statistical analyses that require sample collection, transportation to laboratories, and a lot of time and calculation, which is quietly unavailing because water is a completely transmissible medium and time is necessary if the water is contaminated with diseasecausing waste. The catastrophic consequences of water contamination necessitate a faster and less expensive alternative. In this regard, we developed a real-time system to evaluate an alternative approach based on the advanced artificial intelligence method for modelling and predicting water quality. These mimicking models, however, face some challenges. For example, they do not consider factors affecting WQ. The contributions of the current study are presented to use an advanced AI Adaptive neural-fuzzy inference system ANFIS model that was developed to predict Water quality Index WQI. The Feed-forward neural network FFNN and KNN were used for the Water Quality Classification WQC. The highly efficient advanced AI can be generalized and then used to forecast the water pollution process, which will aid decision-makers in strategizing for timely decisions.  Greece. It is observed that the deep learning model was outperformed compared with the traditional support-vector regression (SVR) model. Maiti et al. [27] predicted dissolved oxygen (DO) levels using the ANN model. Deep learning methods showed higher performance in predicting WQI compared to traditional machine learning techniques, as did AI techniques such as ANN, Bayesian NNs, and adaptive neuro-fuzzy [28]. Piazza et al. [29] presented a comparison between the proposed model's numerical optimization approach and the results of an experimental campaign. The genetic algorithm with a hydraulic simulator was applied to test and evaluate water quality by monitoring it. Sambito et al. [30] developed a smart system based on the Internet of Things and a Bayesian decision network (BDN) for predicting wastewater. The proposed system was focused on analysis and soluble conservative pollutants such as metals, decision support systems, and auto-regressive moving averages, and was applied to predicting the water quality WQ of groundwater [31].

Materials and Methods
Currently, water quality is assessed by costly and time-consuming laboratory and statistical analyses that require sample collection, transportation to laboratories, and a lot of time and calculation, which is quietly unavailing because water is a completely transmissible medium and time is necessary if the water is contaminated with disease-causing waste. The catastrophic consequences of water contamination necessitate a faster and less expensive alternative. In this regard, we developed a real-time system to evaluate an alternative approach based on the advanced artificial intelligence method for modelling and predicting water quality. These mimicking models, however, face some challenges. For example, they do not consider factors affecting WQ. The contributions of the current study are presented to use an advanced AI Adaptive neural-fuzzy inference system ANFIS model that was developed to predict Water quality Index WQI. The Feed-forward neural network FFNN and KNN were used for the Water Quality Classification WQC. The highly efficient advanced AI can be generalized and then used to forecast the water pollution process, which will aid decision-makers in strategizing for timely decisions.

Dataset
The datasets employed to conduct the research were acquired from different locations in India and contained 1679 simples from 666 different sources of rivers and lakes in the country. The data was collected between 2005 and 2014. The link to the datasets is attached. The datasets include eight important parameters: DO, pH, conductivity, biological oxygen demand, nitrate, fecal coliform, temp, and total coliform. However, seven parameters were considered to show significant values, and the developed models were evaluated based on some statistical parameters. All the experiments consisted of temp parameters. The Indian government collected these data to ensure the quality of the drinking water supplied. This dataset was obtained from Kaggle

Dataset
The datasets employed to conduct the research were acquired from different locations in India and contained 1679 simples from 666 different sources of rivers and lakes in the country. The data was collected between 2005 and 2014. The link to the datasets is attached. The datasets include eight important parameters: DO, pH, conductivity, biological oxygen demand, nitrate, fecal coliform, temp, and total coliform. However, seven parameters were considered to show significant values, and the developed models were evaluated based on some statistical parameters. All the experiments consisted of temp parameters. The Indian government collected these data to ensure the quality of the drinking water supplied. This dataset was obtained from Kaggle https://www.kaggle.com/anbarivan/indian-waterquality-data (accessed on 3 December 2020).

Data Preprocessing
The processing phase is very important in data analysis to improve data quality. In this phase, WQI was calculated from the most important parameters of the dataset. Then, water samples were classified on the basis of WQI values. The z-score method was used as a data normalization technique for superior accuracy.

Water Quality Index (WQI) Calculation
The WQI, which is calculated using several parameters that affect WQ [32], was used to measure water quality. The performance of the proposed system was evaluated on the published dataset, with seven important water quality parameters. The WQI was calculated using the following formula: where N denotes the total number of parameters included in the WQI formula, q i denotes the quality estimate scale for each parameter i calculated by Formula (2), and w i denotes the unit weight of each parameter in Formula (3).
where V i is a measured value that refers to the water samples tested, V Ideal is an ideal value and indicates pure water (0 for all parameters except OD = 14.6 mg/L and pH = 7.0), and S i is a standard value recommended for parameter i, as shown in Table 1.
where K denotes the constant of proportionality, which is calculated using the following formula:  Tables 2 and 3 represent the parameters of the unit weight and the WQC, respectively.  WQI can be used to calculate more parameters, including our selecting parameters. The WQI depends on the variable data. The proposed system can test any parameters with any water quality data.

Z-Score Normalization Method
Z-score a is used to normalize data by computing both the mean (µ) and standard deviation. The Z-score was applied to scale parameter values between 0 and 2. It is calculated using the following formula: where x represents the tested sample in the dataset to be evaluated.

Adaptive Neuro-Fuzzy Inference System (ANFIS) Model
The ANFIS model is one of the types of ANN algorithms proposed by Jang [34,35]. This model was used to solve complex and nonlinear problems. The algorithm consists of a neural network and fuzzy logic and is, therefore, powerful. The algorithm is used to predict data and obtain the optimal membership function through an adaptive system in the input layer. The ANFIS model consists of five layers: fuzzification, antecedent, strength normalization, consequent, and inference [36]. Each layer contains many nodes. The ANFIS model is represented by two input parameters and an output parameter, as illustrated in Figure 2. The if-then rules are applied as follows: Rule1 : i f x is A 1 and y is B 1 , then f 1 = p 1 x + q 1 y + r 1 (6) Rule1 : i f x is A 2 and y is B 2 , then f 1 = p 2 x + q 2 y + r 2 where x and y are the input parameters for node i and A 1 , A 2 , B 1 , and B 2 are the fuzzy set. p 1, p 2 , q 1 , q 2 , r 1 and r 2 are the consequent parameters. f is the output of the ANFIS model.

• Layer 1 (Fuzzification Layer):
The first layer implements a membership function to convert the input data into a fuzzy set.
where µ(x) and µ(y) are membership functions; A i is the linguistic variable; and σ i , b i , and c i are the parameters of the Bell function.
• Layer 2 (Antecedent Layer): Nodes in the second layer are fixed nodes where inputs from the previous layer are multiplied with the node value to form an output signal for the second layer.
where w i signal refers to the firing strength of the rule.
• Layer 3 (Strength Normalization Layer): The ratio of i th is calculated to normalize firing strength.
where O 3,i is the output of layer 3 and w is the normalized firing strength.
• Layer 4 (Consequent Layer): The nodes of the fourth layer are adaptable, and the output of this layer is O 4,i . The node function of the fourth layer is defined in the following equation: where p i , q i , and r i are consequent parameters used for the fuzzy inference system function ( f i ).

• Layer 5 (Inference Layer):
This layer is applied to obtain the model output. The final output of a network is described as follows: where μ(x) and μ(y) are membership functions; Ai is the linguistic variable; and σi, bi, and ci are the parameters of the Bell function. Nodes in the second layer are fixed nodes where inputs from the previous layer are multiplied with the node value to form an output signal for the second layer. ANFIS is a back-propagation algorithm in which the error value between the expected and actual outputs, as well as the error function, are calculated. Weights are updated inversely from the fifth layer to the first, and the process continues until the lowest error rate is obtained. Figure 3 shows the framework of an FFNN model for predicting WQI. ANFIS is a back-propagation algorithm in which the error value between the expected and actual outputs, as well as the error function, are calculated. Weights are updated inversely from the fifth layer to the first, and the process continues until the lowest error rate is obtained. Figure 3 shows the framework of an FFNN model for predicting WQI. The training data were divided into 70% for the training phase and 30% for the testing phase. The ANFIS model was processed based on the scatter partition fuzzy approach, which works by clustering to divide dimension vectors in the specific area of the fuzzy rules. The ANFIS model was developed by integrating fuzzy c-means clustering and backpropagation algorithms. The seven clusters and minimum improvement 10 −5 , partition matrix exponent 2, and number epoch 150 were appropriate.

Classification of Water Quality
In this section, two classification algorithms, namely, KNN and FFNN, were presented. The training data were divided into 70% for the training phase and 30% for the testing phase. The ANFIS model was processed based on the scatter partition fuzzy approach, which works by clustering to divide dimension vectors in the specific area of the fuzzy rules. The ANFIS model was developed by integrating fuzzy c-means clustering and backpropagation algorithms. The seven clusters and minimum improvement 10 −5 , partition matrix exponent 2, and number epoch 150 were appropriate.

Classification of Water Quality
In this section, two classification algorithms, namely, KNN and FFNN, were presented.

K-Nearest Neighbors (KNN) Model
The KNN algorithm is one of the traditional machine learning algorithms used for the classification of data. The KNN algorithms use K-neighbor values to find the closest point between the objects. The K-value is used to find the closest points in the feature vectors, and the value should be unique. In this study, three K-values were appropriated to obtain good results. The Euclidean distance function (Di) was applied to find the nearest neighbor in the features vector.
where x 1 , x 2 , y 1 , and y 2 are variables for input data.

Artificial Neural Networks (ANNs)
The artificial neural network is a very powerful computation method for developing a number of real medical applications [37]. In general, ANN models are used as very powerful machine learning algorithms for time series prediction of different engineering applications. The ANN model consists of an input layer, hidden layers, and an output layer. Each hidden layer has weight and bias parameters to manage neurons. To transfer the data from the hidden layer into the output layer, the activation function is used. The learning algorithms are used to select the weights within the NN framework. The weight selection is based on minimum performance measures, such as mean square error (MSE). Figure 4 shows the architecture of FFNN for the classification water quality WQC. In this study, the ANN algorithm was used to classify water quality. ANNs have three significant layers: input, hidden, and output. Five hidden layers were considered to transfer the input training from input to output to the sigmoid function. However, the output layer had three classes.

Performance Measurement
Performance measurement approaches, such as MSE, were applied to evaluate the ability of the proposed model to predict the WQI. Furthermore, the accuracy, specificity, sensitivity, precision, recall, and F-score performance measurements were determined to evaluate the FFNN and KNN classification algorithms to classify the WQC. The statistical methods used are defined as follows: • Mean square error (MSE) • Root mean square error (RMSE) where R is Pearson's correlation coefficient, x is the observation input data in the first set of the training data, y is the observation input data of the second set of the training data, and n is the total number of input variables.
• Accuracy • Specificity • Sensitivity • Precision where TP, TN, FP, and FN are the true positive, true negative, false positive, and false negative, respectively.

Experimental Setup
The empirical results were analyzed in the MATLAB 2020 environment. The simulation was performed by employing a system with an i7 processor and 8 GB RAM to process all required tasks.

Prediction of WQI Using the ANFIS Model
The proposed methodology of the model was validated using 70% of the dataset for training and applying the ANFIS model to predict the WQI. The training results showed that the ANFIS model was highly optimal for predicting WQI. Table 4 summarizes the prediction results of the WQI obtained by the ANFIS model during the training and testing phases. The prediction results showed R% = 92.29%, which demonstrates the highly efficient performance of the proposed system. The prediction results of the ANFIS model in the testing showed R% = 92.39%, according to the correlation regression results. Figure 5 displays the time series plot for training and testing, showing that the target and prediction values were very close; the x-axis presents the numbers of samples; and the y-axis presents the scaling of data. The time steps are from 2005 to 2014, but we divided the dataset into training and testing for validation of the system. Splitting the data was done with a random function for selecting different simple points from the entire dataset. The testing phase is unseen data. It is a process used to validate the ANFIS model to predict the water quality. In the testing phase, the algorithm selects random values from the entire dataset to test the model. samples; and the y-axis presents the scaling of data. The time steps are from 2005 to 2014, but we divided the dataset into training and testing for validation of the system. Splitting the data was done with a random function for selecting different simple points from the entire dataset. The testing phase is unseen data. It is a process used to validate the ANFIS model to predict the water quality. In the testing phase, the algorithm selects random values from the entire dataset to test the model. The prediction results present the predicted values at the testing state. According to the evaluation metrics (MSE, RMSE, mean error, and R), the predicted values of the prediction values were very close to the experimental   Figure 6 shows the histogram error of the predicted WQI values during the training and testing phases. The errors between the WQI prediction values and the WQI observation values were computed to generate an error histogram. These prediction errors can aid in determining how the predicted values deviate from the observation values. This is a histogram error for the training phase only. The maximum error was 0.01287 and the average error was 0.1178. Figure 7 illustrates the regression plot of the ANFIS mode to predict WQI during the training and testing phases. The regression plot was used to determine the correlation The prediction results present the predicted values at the testing state. According to the evaluation metrics (MSE, RMSE, mean error, and R), the predicted values of the prediction values were very close to the experimental ones. Figure 6 shows the histogram error of the predicted WQI values during the training and testing phases. The errors between the WQI prediction values and the WQI observation values were computed to generate an error histogram. These prediction errors can aid in determining how the predicted values deviate from the observation values. This is a histogram error for the training phase only. The maximum error was 0.01287 and the average error was 0.1178. ues represent the predicted WQI values generated by employing the ANFIS model. The target values are closest to the prediction values. The strong relationship between the observation values and the prediction values, which led to a good model, The correlation analysis revealed the highly efficient performance of the developed model.
The empirical results presented demonstrate the highly efficient performance of the ANFIS model.

Experiment Results of WQC Classification
This section presents the results of the classification algorithms used to predict the WQC. Table 5 shows the results of the FFNN and KNN machine learning algorithms. The performance of the FFNN model was superior compared to that of the KNN algorithm.

Experiment Results of WQC Classification
This section presents the results of the classification algorithms used to predict the WQC. Table 5 shows the results of the FFNN and KNN machine learning algorithms. The performance of the FFNN model was superior compared to that of the KNN algorithm. The empirical results presented demonstrate the highly efficient performance of the ANFIS model.

Experiment Results of WQC Classification
This section presents the results of the classification algorithms used to predict the WQC. Table 5 shows the results of the FFNN and KNN machine learning algorithms. The performance of the FFNN model was superior compared to that of the KNN algorithm. The accuracy, specificity, sensitivity, precision, recall, and F-score of the FFNN algorithm were 100%, 99.61%, 99.61%, 99.61%, and 100%, respectively. Notably, the performance of the FFNN outperformed that of the KNN algorithm. Figure 8 shows the confusion matrix of the FFNN model used to classify WQ. To validate the proposed system, we divided the dataset into 70% training and 30% testing. The numbers of false positives, false negatives, true positives, and true negatives were reported using a confusion matrix. The total number of samples of data was 1679, and we divided the data into 1119 samples as training, 280 as testing, and 280 as validation. It is observed that all sample data in both phases' classification were true positive. The x-axis values represent class target and the y-axis values denote class output. The classes are categorized into 1 (excellent), 2 (good), 3 (poor), and 4 (very poor). Table 5. Performance of the machine learning models used to predict WQC. The accuracy, specificity, sensitivity, precision, recall, and F-score of the FFNN algori were 100%, 99.61%, 99.61%, 99.61%, and 100%, respectively. Notably, the performanc the FFNN outperformed that of the KNN algorithm. Figure 8 shows the confusion ma of the FFNN model used to classify WQ. To validate the proposed system, we divided dataset into 70% training and 30% testing. The numbers of false positives, false negati true positives, and true negatives were reported using a confusion matrix. The total n ber of samples of data was 1679, and we divided the data into 1119 samples as train 280 as testing, and 280 as validation. It is observed that all sample data in both pha classification were true positive. The x-axis values represent class target and the yvalues denote class output. The classes are categorized into 1 (excellent), 2 (good), 3 (po and 4 (very poor).    Figure 9 shows the histogram error of the predicted WQI values at the training and testing phases. The maximum error was 0.0228. Sustainability 2021, 13, x FOR PEER REVIEW 13 of 18 Figure 9 shows the histogram error of the predicted WQI values at the training and testing phases. The maximum error was 0.0228. The receiver operating characteristic (ROC) was used as a metric to display the FFNN confusion metric properties, such as true positive and false positive, for WQC. Figure 10 shows the ROC for measuring the validity of the FFNN model based on the real standard dataset. All graphs for the testing, training, and validation of the system are presented. The last graph shows the overall ROC of the system. Notably, the detection rate was very high, and the misclassification rate was very low. The x-and y-axis represent the false positive rate (misclassification) and true positive rate, respectively. The results demonstrate the highly efficient performance of the FFNN model for WQC. The receiver operating characteristic (ROC) was used as a metric to display the FFNN confusion metric properties, such as true positive and false positive, for WQC. Figure 10 shows the ROC for measuring the validity of the FFNN model based on the real standard dataset. All graphs for the testing, training, and validation of the system are presented. The last graph shows the overall ROC of the system. Notably, the detection rate was very high, and the misclassification rate was very low. The xand y-axis represent the false positive rate (misclassification) and true positive rate, respectively. The results demonstrate the highly efficient performance of the FFNN model for WQC.

Models Accuracy (%) Sensitivity (%) Specificity (%) Precision (%) Recall (%)
Sustainability 2021, 13, x FOR PEER REVIEW 13 of 18 Figure 9 shows the histogram error of the predicted WQI values at the training and testing phases. The maximum error was 0.0228. The receiver operating characteristic (ROC) was used as a metric to display the FFNN confusion metric properties, such as true positive and false positive, for WQC. Figure 10 shows the ROC for measuring the validity of the FFNN model based on the real standard dataset. All graphs for the testing, training, and validation of the system are presented. The last graph shows the overall ROC of the system. Notably, the detection rate was very high, and the misclassification rate was very low. The x-and y-axis represent the false positive rate (misclassification) and true positive rate, respectively. The results demonstrate the highly efficient performance of the FFNN model for WQC. A performance plot was used to identify the MSE in the network of WQC. The performance of the FFNN model is illustrated in Figure 11. The best validation achieved by the FFNN model was 2.24613 × 10 −6 at epoch 52. In the performance of the FFNN model, the MSE decreased rapidly as it learns. The blue, green and red lines represent the training process, validation error and training error, respectively. Increased numbers of epochs indicate that the training data had small errors. When the validation error stops, the training stops. A performance plot was used to identify the MSE in the network of WQC. The performance of the FFNN model is illustrated in Figure 11. The best validation achieved by the FFNN model was 2.24613 × 10 −6 at epoch 52. In the performance of the FFNN model, the MSE decreased rapidly as it learns. The blue, green and red lines represent the training process, validation error and training error, respectively. Increased numbers of epochs indicate that the training data had small errors. When the validation error stops, the training stops. Figure 11. Performance plot of training WQ data using the FFNN model, best performance between 10 0 to 10 −2 .

Discussion
Modelling and the prediction of water quality have played a pivotal and significant role in saving time and consumption in lab analysis. Artificial intelligence algorithms were explored as an alternative method to estimate and predict water quality. This study used the experimental data of 1679 samples from 666 different water bodies of rivers and lakes from different states in India. The dataset includes seven selected important parameters: DO, pH, conductivity, BOD, nitrate, fecal coliform, and total coliform. Table 6 summarizes the existing model results against our proposed system. There are various studies that used machine learning models for modelling and predicting WQ. Ahmed et al. [38] applied the FFNN model to predict WQI, and 25 parameters were used as input data. Gazzaz et al. [39] applied machine learning to predict WQI, and 23 input parameters were considered. Sakizadeh [40] employed 16 parameters. Rankovic et al. [41] proposed an artificial intelligence model to predict WQ using 10 input parameters. Umair Ahmed et al. [42] used various machine learning models for WQI and WQC, and four Figure 11. Performance plot of training WQ data using the FFNN model, best performance between 10 0 to 10 −2 .

Discussion
Modelling and the prediction of water quality have played a pivotal and significant role in saving time and consumption in lab analysis. Artificial intelligence algorithms were explored as an alternative method to estimate and predict water quality. This study used the experimental data of 1679 samples from 666 different water bodies of rivers and lakes from different states in India. The dataset includes seven selected important parameters: DO, pH, conductivity, BOD, nitrate, fecal coliform, and total coliform. Table 6 summarizes the existing model results against our proposed system. There are various studies that used machine learning models for modelling and predicting WQ. Ahmed et al. [38] applied the FFNN model to predict WQI, and 25 parameters were used as input data. Gazzaz et al. [39] applied machine learning to predict WQI, and 23 input parameters were considered. Sakizadeh [40] employed 16 parameters. Rankovic et al. [41] proposed an artificial intelligence model to predict WQ using 10 input parameters. Umair Ahmed et al. [42] used various machine learning models for WQI and WQC, and four parameters were used as input data. It is noted that the polynomial regression model is good for predicting WQI, whereas the multi-layer perceptron (MLP) model is suitable for classifying WQC. Although fewer parameters were used in this investigation, the results of this research are superior to others. Selecting few parameters is suitable for expensive real-time systems. In this study, seven significant parameters were used for modelling and predicting WQI, with superior results (having a very low error prediction (MSE = 0.00336), and a high value for the correlation regression (R = 96.17%).
Moreover, using the FFNN model, a system to detect WQC was developed with the highest accuracy (100%). The proposed method is presented to use only seven water quality parameters for predicting and classifying water quality, so the empirical results confirmed the effectiveness of the model, whereas previous research used machine learning models but with less accuracy.
This system can monitor drinking water and contaminated water with high accuracy. This study suggests that the combined approach of the artificial intelligence techniques proposed in the current study should be applied as a promising tool to accurately simulate water level and quality. The developed model has shown acceptable performance when compared with the available ones, as presented in Table 6.
The ultimate goal of this work is to serve and directly align with Sustainable Development Goal (SDG) 6, which aims to ensure access to clean water for all. The developed model can be used easily and inexpensively to predict water quality and index and thus water quality classification with high accuracy. In addition, this kind of model is robust and can forecast water contamination and thus guide the authorized governments/agencies to develop effective strategies for better water sustainability and management through the removal of the contamination source and/or seek for an alternative source of pure water to meet the community demand.

Conclusions
Modelling and predicting water quality using AI algorithms is very important for the protection of the environment. The artificial intelligence models were developed to predict and classify water quality for drinking by employing data from rivers collected from different locations in Indian states. WQI was applied to calculate seven important parameters: DO, pH, conductivity, biological oxygen demand, nitrate, fecal coliform, and total coliform. These were considered significant parameters for water quality. Developing new methodologies using advanced AI ANFIS algorithms can help ensure a safe environment. In this proposed methodology, advanced ANFIS algorithms were used to predict WQI. The FFNN algorithm was used to classify the WQC data. The proposed methodology was statistically evaluated and tested. The following conclusions can be drawn by using advanced AI to monitor WQ: • First, the present study explored an alternative method of artificial intelligence to predict water quality by employing minimal and available water quality parameters. The datasets employed to conduct the research were acquired from different locations in India and contained 1679 samples from 666 different sources of rivers and lakes in the country. Artificial intelligence models were applied to predict and classify WQI. • Fourth, the system will help reduce people's consumption of poor-quality water and consequently curtail horrific diseases such as typhoid and diarrhea. In this case, our application can improve water pollution in different water bodies. The robustness and efficiency of the proposed model in predicting WQI can be examined in future works. The developed models can be implemented to predict the quality of different types of water in Saudi Arabia. Informed Consent Statement: Not Applicable.