# Modelling and Prediction of Water Quality by Using Artificial Intelligence

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Dataset

#### 2.2. Data Preprocessing

#### 2.2.1. Water Quality Index (WQI) Calculation

denotes the quality estimate scale for each parameter i calculated by Formula (2), and w

denotes the unit weight of each parameter in Formula (3).

is a measured value that refers to the water samples tested, V

is an ideal value and indicates pure water (0 for all parameters except OD = 14.6 mg/L and pH = 7.0), and S

is a standard value recommended for parameter i, as shown in Table 1.

#### 2.2.2. Z-Score Normalization Method

#### 2.3. Adaptive Neuro-Fuzzy Inference System (ANFIS) Model

- Layer 1 (Fuzzification Layer):

is the linguistic variable; and σ

i, b

i, and c

are the parameters of the Bell function.

- Layer 2 (Antecedent Layer):

signal refers to the firing strength of the rule.

- Layer 3 (Strength Normalization Layer):

is the output of layer 3 and $\overline{w}$ is the normalized firing strength.

- Layer 4 (Consequent Layer):

- Layer 5 (Inference Layer):

partition matrix exponent 2, and number epoch 150 were appropriate.

#### 2.4. Classification of Water Quality

#### 2.4.1. K-Nearest Neighbors (KNN) Model

1, x

2, y

1, and y

are variables for input data.

#### 2.4.2. Artificial Neural Networks (ANNs)

#### 2.5. Performance Measurement

- Mean square error (MSE)$$MSE=\frac{1}{N}{\displaystyle \sum}_{i=1}^{N}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}$$

- Root mean square error (RMSE)$$RMSE=\text{}\sqrt{{\displaystyle \sum}_{i=1}^{N}\frac{{\left(y-\widehat{y}\right)}^{2}}{N}}$$$$R=\frac{n{{\displaystyle \sum}}^{\text{}}\left(x\text{}\times \text{}y\right)-({{\displaystyle \sum}}^{\text{}}x)\text{}({{\displaystyle \sum}}^{\text{}}y)}{\text{}[n{{\displaystyle \sum}}^{\text{}}({x}^{2})-{{\displaystyle \sum}}^{\text{}}({x}^{2})]\text{}\times \text{}[n{{\displaystyle \sum}}^{\text{}}({y}^{2})-{{\displaystyle \sum}}^{\text{}}({y}^{2})]}\times 100\%$$

- Accuracy$$Accuracy=\frac{TP+TN}{TP+FP+FN+TN}\times 100\%$$

- Specificity$$Specificity=\frac{TN}{TN+FP}\times 100\%$$

- Sensitivity$$Sensitivity=\frac{TP}{TP+FN}\times 100\%$$

- Precision$$Precision=\frac{TP}{TP+FP}\times 100\%$$

- F-score$$\mathrm{F}-\mathrm{score}\text{}=\frac{2\ast preision\ast \mathrm{Sensitivity}}{preision+\mathrm{Sensitivity}}\times 100\%$$

## 3. Experimental Setup

#### 3.1. Prediction of WQI Using the ANFIS Model

#### 3.2. Experiment Results of WQC Classification

at epoch 52. In the performance of the FFNN model, the MSE decreased rapidly as it learns. The blue, green and red lines represent the training process, validation error and training error, respectively. Increased numbers of epochs indicate that the training data had small errors. When the validation error stops, the training stops.

## 4. Discussion

## 5. Conclusions

- First, the present study explored an alternative method of artificial intelligence to predict water quality by employing minimal and available water quality parameters. The datasets employed to conduct the research were acquired from different locations in India and contained 1679 samples from 666 different sources of rivers and lakes in the country. Artificial intelligence models were applied to predict and classify WQI.
- Second, an advanced AI ANFIS model can be developed to predict WQI by selecting important parameters from a standard dataset. Notably, prediction values were very close to the observation values.
- Third, machine learning algorithms, namely, FFNN and KNN, can be developed for WQC. The FFNN outperformed KNN in WQC. The classification results of FFNN were superior to those of the KNN algorithm.
- Fourth, the system will help reduce people’s consumption of poor-quality water and consequently curtail horrific diseases such as typhoid and diarrhea. In this case, our application can improve water pollution in different water bodies. The robustness and efficiency of the proposed model in predicting WQI can be examined in future works. The developed models can be implemented to predict the quality of different types of water in Saudi Arabia.

Architecture of the adaptive euro-fuzzy inference system ANFIS model (

A) order of sugeno and (

B) layers of ANFIS model.

Histogram error of the FFNN model for the WQC, the histogram error between 0.9749 to −0.0228.

Performance plot of training WQ data using the FFNN model, best performance between 10

to 10

^{−2}.

Permissible limits of the parameters used in calculating the WQI.

Parameters | Permissible Limits |
---|---|

Dissolved oxygen, mg/L | 10 |

pH | 8.5 |

Conductivity, µS/cm | 1000 |

Biological oxygen demand, mg/L | 5 |

Nitrate, mg/L | 45 |

Fecal coliform/100 mL | 100 |

Total coliform/100 mL | 1000 |

Water Quality Index Range | Classification |
---|---|

0–25 | Excellent |

26–50 | Good |

51–75 | Poor |

76–100 | Very poor |

Parameter | Unit Weight (w_{i}) |
---|---|

Dissolved Oxygen | 0.2213 |

pH | 0.2604 |

Conductivity | 0.0022 |

Biological Oxygen Demand | 0.4426 |

Nitrate | 0.0492 |

Fecal Coliform | 0.0221 |

Total Coliform | 0.0022 |

Model | Training Dataset | Testing Data | ||||||
---|---|---|---|---|---|---|---|---|

MSE | RMSE | Mean Errors | R (%) | MSE | RMSE | Mean Errors | R (%) | |

ANFIS | 0.00336 | 0.0580 | 6.456 × 10^{−9} | 90.29 | 0.0029 | 0.0540 | 0.001330 | 92.39 |

Models | Accuracy (%) | Sensitivity (%) | Specificity (%) | Precision (%) | Recall (%) |
---|---|---|---|---|---|

FFNN | 100 | 99.61 | 99.61 | 99.961 | 100 |

KNN | 80.63 | 82.50 | 89.50 | 82.50 | 86.84 |

Authors | Water Body | Place of Study | Models | Number of Parameters | Purpose of System | Results of WQI | Results of WQC | |||
---|---|---|---|---|---|---|---|---|---|---|

Model Prediction of WQI | Model for Classification of WQC | WQI | WQC | MSE | R% | Accuracy (%) | ||||

Ahmed et al. [38] | River | Malaysia | FFNN | Not used | 25 | Yes | No | 0.1156 | 97.7 | No |

Gazzaz et al. [39] | River | Malaysia | ANNs | Not used | 23 | Yes | No | 9.25 | 77.0 | No |

Sakizadeh et al. [40] | Groundwater | Iran | ANNs | Not used | 16 | Yes | No | 9.25 | 77.0 | No |

Rankovic et al. [41] | River | Serbia | FFNN | Not used | 10 | Yes | No | 0.9923 | 87.4 | No |

Umair Ahmed et al. [42] | River | Pakistan | Polynomial regression | Multi-layer perceptron (MLP) | 4 | Yes | Yes | 7.9467 | - | 85.07 |

Proposed system | Rivers and lakes | India | ANFIS | FFNN | 7 | Yes | Yes | 0.0029 | 96.17 | 100 |

