An Enhanced Water Quality Index for Water Quality Monitoring Using Remote Sensing and Machine Learning

: Water quality deterioration is a serious problem with the increase in the urbanization rate. However, water quality monitoring uses grab sampling of physico-chemical parameters and a water quality index method to assess water quality. Both processes are lengthy and expensive. These traditional indices are biased towards the physico-chemical parameters because samples are only collected from certain sampling points. These limitations make the current water quality index method unsuitable for any water body in the world. Thus, we develop an enhanced water quality index method based on a semi-supervised machine learning technique to determine water quality. This method follows ﬁve steps: (i) parameter selection, (ii) sub-index calculation, (iii) weight assignment, (iv) aggregation of sub-indices and (v) classiﬁcation. Physico-chemical, air, meteorological and hydrological, topographical parameters are acquired for the stream network of the Rawal watershed. Min-max normalization is used to obtain sub-indices, and weights are assigned with tree-based techniques, i.e., LightGBM, Random Forest, CatBoost, AdaBoost and XGBoost. As a result, the proposed technique removes the uncertainties in the traditional indexing with a 100% classiﬁcation rate, removing the necessity of including all parameters for classiﬁcation. Electric conductivity, secchi disk depth, dissolved oxygen, lithology and geology are amongst the high weighting parameters of using LightGBM and CatBoost with 99.1% and 99.3% accuracy, respectively. In fact, seasonal variations are observed for the classiﬁed stream network with a shift from 55:45% (January) to 10:90% (December) ratio for the medium to bad class. This veriﬁes the validity of the proposed method that will contribute to water management planning globally.


Introduction
Water is an essential resource for the sustenance of all living organisms. Being a significant resource, its quality needs to be monitored and managed properly. However, it is constantly being affected by anthropogenic activities caused by growing urbanization. Factors such as soil erosion and climate change can have a huge impact on the physical landscape of water bodies. These factors are usually ignored while assessing water quality, and traditionally, only the physico-chemical parameters, such as turbidity, conductivity and pH are used. However, such factors are not enough to accurately analyse the conditions that can have an impact on the water body. Thus, topographical and hydrological parameters, such as slope, aspect, lithology, geology and soil type, may have a direct/indirect impact on the overall quality of the water body. Similarly, the abundance of air pollutants such as nitrogen and carbon dioxide can cause water eutrophication [1], acidification [2] and nutrient pollution [3] that can be harmful for the aquatic ecosystem. Moreover, heavy precipitation can also directly affect the water cycle resulting in the deterioration of the water bodies [4].
In reality, the monitoring of multiple contamination sources is a tedious and expensive task that involves field visits and laboratory work. Utilizing remote sensing and machine learning technology to overcome such challenges is being used by many researchers [5]. This, in turn, can make the sampling process more robust and economical. In fact, such technology can be utilized to assess water quality based on the combined impact of different parameters that can be a complex task otherwise. Traditionally, a water quality index (WQI) is a weighted average of selected ambient concentrations of pollutants providing a single number that represents the overall water quality at a certain location and time. The most frequently used WQIs include the National Sanitation Foundation Water Quality Index (NSFWQI) [6], the Canadian Council of Ministers of the Environment Water Quality Index (CCME) [7], the Oregon Water Quality Index (OWQI) [8], etc. However, the application of the WQIs on water samples is a biased approach as each index is built specific to certain locations or water types or is sensitive to specific parameter concentrations or dependency on the weights assigned [9]. Such limitations make the traditional WQIs unsuitable for application on any general water body.
To overcome these challenges, an enhanced water quality index (EWQI) is proposed in this research that uses machine learning and data mining methods to analyse the combined impact of different factors, including hydrological, topographical, meteorological, air and physico-chemical parameters and assigns appropriate weights using the tree-based, i.e., CatBoost and LightGBM, methods. Thus, a machine learning approach is proposed as a replacement WQI that can remove the bias and can be applied to any water body regardless of the selected parameters. A total of twenty-two parameters are extracted for the time period of July 2018 to August 2022 that include seven water quality parameters, i.e., total dissolved solids (TDS), pH, electrical conductivity (EC), Secchi disk depth (SDD), dissolved oxygen (DO), turbidity (Tur) and chlorophyll-α (chl-α) are acquired from the Sentinel-2 Multispectral Imager (S2-MSI) Level 1C (L1C) satellite; six air pollutants that include carbon monoxide (CO), nitrogen dioxide (NO 2 ), ozone (O 3 ), sulphur dioxide (SO 2 ), formaldehyde (HCHO) and methane (CH 4 ) are acquired from the Sentinel-5 Precursor Level 2 (S5P-L2) TROPOspHeric Monitoring Instrument (TROPOMI); three meteorological parameters, namely air temperature, wind speed and total precipitation are taken from the ERA5 Climate Reanalysis Project, (ERA5-CRP); and lastly, six hydrological and topographical parameters that include slope, aspect, soil type, lithology, geology and land use/land cover are acquired from the Digital Elevation Model (DEM) created with Shuttle Radar Topography Mission (SRTM) data. The NSFWQI method that uses the water quality parameters for evaluation is used to compare the quality of the Rawal Stream Network with the new proposed EWQI that is based on the extracted twenty-two parameters. Moreover, using a remote sensing and machine learning approach can help in analyzing the different factors affecting water quality which are applicable on a global scale. This research reveals that the new proposed EWQI is a much more reliable and accurate index compared to the state-of-the-art NSFWQI method as it: (i) operates well with or without missing parameters, (ii) identifies the temporal and seasonal variations, and (iii) considers all other environmental factors while classifying the water body. The major contributions of this study are as follows: 1.
Twenty-two parameters are extracted for the stream network of the Rawal watershed that include seven water quality parameters, six air pollutants and three meteorological and six hydrological/topographical parameters pertaining to the years (2018-2022) for the monsoon months of June to September.

2.
A multimodal indexing technique, EWQI, is proposed that involves five steps: parameter selection, sub-index calculation, weight assignment, aggregation of sub-indices and classification using a machine learning approach for weight assignment, subindex calculation and remote sensing technology for parameter selection to extract twenty-two multimodal parameters.
This paper is organized as follows: Section 2 discusses the related work. Section 3 explains the proposed EWQI. Section 4 covers the proposed methodology for the extraction of the twenty-two parameters, and the application of the EWQI method is discussed for the Rawal stream network. The results of the comparison between NSFWQI and EWQI are discussed in Section 5. In Section 6, the conclusion of this research is presented.

Literature Review
WQIs that are based on physico-chemical and biological parameters are used for monitoring the quality of water at different locations, such as the United Kingdom [10], Dalmatia [11], Zimbabwe [12], Argentina [13] and India [14]. Over the years, a number of water quality indices have been proposed that first convert raw parameter concentrations into a sub-index or quality rating (q) value and aggregate these indices to obtain a final water quality index value [15]. This value lies in the range of 0 to 100 and is classified accordingly [16]. Among the most commonly used WQIs are NSFWQI, CCME, OWQI, weighted arithmetic WQI (WAWQI) and minimum operator index (MOI) [17]. The classification and number of parameters used for these indices are given in Table 1. WAWQI and NSFWQI use the unit weight (w) and q of the nth parameter to calculate the final WQI value as seen in Table 1. The CCME is based on: (1) scope F 1 , (2) frequency F 2 and (3) amplitude F 3 .  [17] 80 to 100 Eminently suitable for all uses n = number of parameters, 60 to 79 Suitable for all uses SI n = SI is the sub-index for the nth parameter 40 to 59 Main use may be compromised 20 to 39 Unsuitable for several uses 0 to 19 Totally unsuitable for many uses Some of these indices use expert opinions in identifying important parameters, weight assignment and transformation to sub-indices [10]. Other development techniques include fuzzy interference [19] and the Delphi method is used in NSFWQI, OWQI and the index of water quality (IWQ) [20]. However, the common attribute amongst these indices is the use of physico-chemical variables. Of parameters used, 6% are biological, 24% are physical, and 70% are chemical [21]. Amongst them, the DO, total coliforms have an 87% selection rate [22]. Biological oxygen demand and pH are selected at a 73% rate [23]. Temperature, Tur, ammonia and TDS have a 47% selection rate [24]. The problem identified for most of these indices is that they are very sensitive to the parameters involved in classifying a water body. Even a single parameter with a slightly high concentration value can affect the index classification [9]. Studies have used grab sampling or data acquisition from government authorities to analyze physico-chemical parameters such as pH [25], conductivity [26], hardness [27], and phosphate [28], and the WQI is calculated to identify the underlying issues.
The literature reveals that the traditional indices are based on specific physico-chemical water parameters and thus have limitations that make these indices unsuitable for worldwide use. The uncertainty of the WQIs makes them unpredictable for complex environmental situations [29]. These indices are biased to a set of parameters, place, area and purpose of use. The dynamic nature of the water body can cause certain changes in the physico-chemical properties [30]. Moreover, the influence of air pollutants, meteorological features and hydrological features on the aquatic ecosystems is ignored in the development of the WQI method [31]. These challenges indicate that most WQIs fail to accurately classify a water body. Therefore, there is a need for a universally accepted index that removes the uncertainties and bias in the traditional standards.

Enhanced Water Quality Index
The WQI development method involves five common steps [32] that include parameter selection, sub-index calculation, weight assignment, sub-indices aggregation and classification. To enhance the methodology involved in the development of this technique, machine learning methods are used. Moreover, instead of using the traditional water quality standards for weight assignment, a tree-based scoring technique is used. For the development of the EWQI, the methods are trained on the training set, and the best performing technique is applied on the test set. The process is further described in detail as follows:

Parameter Selection
Most WQI development techniques involve subjective methods for selection of parameters that include water regulatory organizations, the Delphi method and expert opinion. Multiple parameters are involved in the calculation of a single WQI value. These mostly include the physico-chemical characteristics of the water bodies. Generally, these parameters are not enough for assessing the water quality of any water body. Certain parameters, such as hydrological, air and meteorological variables, can influence water quality in a wider manner and cannot be neglected in the calculation of the WQI value.

Sub-Index Calculation
This step is used to transform the different parameters to a uniform scale. Each parameter has a different unit. For example, the physico-chemical parameters, i.e., DO and chl-α, are measured in mg/L. Air pollutants are measured in mol/m 2 for NO 2 , SO 2 , CO, O 3 , HCHO and parts per million (ppm) for CH 4 . The meteorological parameters are measured in ms −1 for wind speed and Kelvin for air temperature. Similarly, slope is given in % and aspect parameter in degrees. Traditionally, the transformation of parameters is performed by linear, non-linear functions, fuzzy membership and expert opinion that may involve using national and international standards. These standards are applied to a formula to obtain a sub-index value in the range of 100. In machine learning, the normalization of parameters is a common data preprocessing step. The most used normalization is the min-max method. Here, this technique is applied to the preprocessed training data to transform the values in a 0-100 range. The new sub-index formula is given in Equation (1), where q = sub-index value, v = parameter value, max A = maximum value of the parameter, and min A = minimum value of the parameter.

Weight Assignment
This step involves assigning weights to each parameter. Previously, WQI calculation involved assigning unequal weights to parameters [6] or giving equal weights or no weights [7]. Usually, this is accomplished by assigning a 1-5 range to the variables. The high priority variables are given a weighting of five and low priority variables a value of one. Then, relative weights are computed. This method is known as the ranking method. Other weighting techniques are expert opinion, fuzzy interference and the Delphi method. Such weighting methods can introduce bias in the method and are dependent on the inclusion of all the selected parameters. Any missing parameter will directly affect the resultant WQI value. In our index, this process is replaced by a semi-supervised technique that involves first clustering the data and then applying an algorithm. The K-means clustering method is applied on the training data. The Elbow method is used to obtain the number of K. Once the training set is clustered, the tree-based feature importance scores are calculated. There are five tree-based feature importance methods in machine learning, i.e., XGBoost (XGB), Random Forest (RF), LightGBM (LGBM), CatBoost (CatB) and AdaBoost (AdaB) are taken to obtain weights for the parameters, and then the relative weights are computed. The final weights with the highest accuracy are used to calculate the EWQI on the test set. Equation (2) shows the formula for the relative weight of the parameter n where S n = Score of the nth parameter.

Sub-Indices Aggregation
Once the weights are assigned and the sub-index values are calculated, the EWQI is computed by aggregating the values using either geometric or arithmetic mean, logarithmic function or root square. The formula for the EWQI is given in Equation (3). Here, n is the number of the parameters selected which are mentioned in Section 3.1, q n is the quality rating or sub-index of the nth parameter which is calculated by Equation (1), and W n is the relative weight of the nth parameter calculated by Equation (2).

Methodology
The methodology for the application of the EWQI is given in detail in this section. The proposed EWQI is applied on the study area of the Rawal watershed. Figure 1 shows the high-level methodology applied for the acquisition of features and development of the new index.  (9), calculation and application of the EWQI for the study area (10,11).

Study Area
The Rawal watershed [33] begins at a lake located at latitude: 33 • 42 N, longitude: 73 • 7 E in Islamabad, Pakistan which supplies water to a population of around 3 million. Using Geographic Information System (GIS) tools, a water stream network is extracted from the Rawal watershed to analyze the water-associated properties of the area, excluding the land attributes. The SRTM [34] data is mosaicked for the selected region to create a DEM, and sequentially, a stream network is clipped by applying the GIS hydrology tools. Figure 2 shows the DEM of the study area, i.e., the Rawal watershed encompassing the stream network.

Data Acquisition
Four categories of data are acquired which encompass (i) aysicochemical parameters, (ii) hydrological and topographical parameters, (iii) air parameters, and (iv) meteorological parameters. The sources of the extracted parameters are listed in Table 2.

Physico-Chemical Parameters
The data was acquired from the Google Earth Engine, which comprised S2-MSI L1C images for extracting water quality parameters. The SRTM data is used for the creation of DEM. The S2-MSI L1C contains Top of Atmosphere (TOA) images factored by a value of 10,000. These images were observed for the monsoon season, i.e., June to September of 2018 to 2022 for the Rawal stream network. The different band compositions of the images were used to acquire physical parameters for the stream network using the adapted equations for calculating the TDS, pH, EC, SDD, DO, Tur and chl-α that are mentioned in Table 3. The equation (given in Table 3 Figure 3 shows a sample of the parameters extracted for July 2020.   The hydrological and topographical parameters were acquired from the different sources that are mentioned in Table 2. The slope for the study area was then acquired from the DEM using ArcGIS tools. The slope attribute extracted for the observed study area is classified into six classes; (i) flat (0-3%), (ii) gentle sloping (3-8%), (iii) sloping  Figure 4. The impact of sun is determined by the aspect parameter which gives an understanding of the plants that colonize the slope and eventually determines the animals that may be seeking food. The Rawal watershed has a south-facing slope which is warmer, and the soil tends to dry out faster in such slopes. The soil type parameter is also an important attribute that plays a part in assessing the quality of the water. Soils with higher infiltration capacity can decrease the runoff to a great degree. The soil types for the Rawal watershed are classified as (i) Be-eutric cambisois and (ii) Rc-calcaric regosois, with a 99:1% ratio. The eutric cambisois class lies in the hydro group B category which means that such soil types have a moderate infiltration rate.
Moreover, the topographical parameters, i.e., the geological formations, of the study area are classified as Cenozoic and Upper Paleozoic (Dev, Car, Per) with a 44:56% ratio. Lithology for the Rawal watershed has siliciclastic sedimentary consolidated (Ss) and mixed sedimentary consolidated (Sm) rocks with a 44:56% ratio. Such rocks have a high resistance to erosion and poor solubility rate. Additionally, the type of land use is an important factor in determining the behaviour of the watershed as they affect the water infiltration rate. The land use/land cover parameter for the watershed is classified as (i) trees, (ii) shrubland, (iii) grassland, (iv) cropland, (v) built-up, (vi) barren/sparse vegetation, (vii) open water, and (viii) herbaceous wetland.

Air Parameters
The air parameters were extracted from S5P-L2 satellite images that comprise six pollutants: CO, NO 2 , O 3 , SO 2 , HCHO and CH 4 , shown in Figure 5. The NO 2 concentrations are extracted using band 4 of the TROPOMI L2's UV, UV-VIS spectrometer [50]. Band 3 of the UV-VIS spectrometer is used to derive the HCHO [51], O 3 [52] and SO 2 [53] concentrations. Band 7 of the SWIR spectrometer is used to measure CH 4 and CO concentrations [54].

Meteorological Parameters
Air temperature, wind speed and total precipitation were extracted from the ERA5-CRP [55], shown in Figure 6. This project has a climate data store that was assembled using assimilation and advanced modelling to obtain the historical observations into a global consistent form. The air temperature is at a 2 m distance, and wind speed is at a 10 m distance from the surface of the Earth.

Data Preprocessing
Data were acquired using the Google Earth Engine [56] software. The maps were prepared by Arc-Map 10.8 [57]. The S2-MSI L1C, S5P-L2, ERA5-CRP images were preprocessed to extract the parameters from the selected Rawal watershed DEM. GIS clipping tools were used to select the target boundaries from the image to extract the area of interest. A total of 4998 points were extracted from each monsoon month in the time period of July 2018 to August 2022, giving a total of 284,889 or approximately 0.3 M sample points. These sample points were extracted from the Rawal stream network as the watershed region covers a land and water region. To make a dataset with all the features, the four categories of data were joined based on the matching dates and latitude-longitude. The hydrological and topographical data is consistent or stable data that generally remains the same regardless of the time and is joined on the basis of matching latitude-longitude. Once the sample points are extracted and the dataset with the twenty-two parameters is created, a set of preprocessing techniques is performed that include:

1.
Replacing the missing values: The missing values are replaced using imputation techniques. The numerical data is imputed with the average or mean. The categorical data is imputed using the most frequent value method.

2.
Replacing the categorical data: The categorical data is converted to numeric form by using the encoding technique. For example, geology Splitting the dataset: The data are split into train and test sets with a 60:40 ratio.

Results and Discussion
Once the dataset is compiled and preprocessed using the methods mentioned in Section 4, the selected twenty-two parameters are used for calculating the EWQI. These include six air pollutants (CO, NO 2 , O 3 , SO 2 , HCHO and CH 4 ), six hydrological parameters (lithology, land use/land cover, soil type, slope, aspect and geology), three meteorological variables (air temperature, wind speed and total precipitation) and seven physico-chemical water quality features (TDS, pH, EC, SDD, DO, Tur and chl-α). The selection of parameters is reassessed in the "Weight Assignment" stage using the tree-based algorithms. Next, the selected parameters are transformed using min-max normalization to a range of 0-100. The physico-chemical parameters, i.e., DO and chl-α are measured in mg/L, while air pollutants are measured in mol/m 2 . Thus, this step is necessary to obtain a uniform dataset.
Then, a feature weighting technique is applied. For this step, the dataset is divided into 60:40% train and test sets. Both the training and test set results are mentioned. Moreover, with the 40% test set used to verify the proposed technique, a set of test data is acquired for the year 2020. This test set is taken to explore the EWQI results whether the index is functional under seasonal restrictions or there are certain missing parameters such as the state-of-the-art NSF method. It contains the days from other seasons besides the monsoon months that were originally used in the training dataset. This will help in the analysis and verification of the newly developed EWQI. The optimal number of clusters is four for the preprocessed train data. The clustering is performed to categorize the data samples as Class 1, 2, 3 and 4. Once the data is clustered and labelled, tree-based feature weighting is applied to obtain the parameter scores. Table 4 shows the weighting methods, scores and accuracy achieved on the training data. The best accuracy of 99.34% and 99.1% was achieved with the CatB and LGBM methods. The LGBM gave the best accuracy with 21 parameters, where the "Geology" parameter is discarded. The CatB method gave its best accuracy with the 22 parameters. Table 4 also shows the parameter scores of the feature weighting methods. XGB gave the highest scores to EC, SDD and lithology parameters. RF gave the highest scores to EC, TDS and geology, whereas LGBM gave SDD, pH, DO and O 3 the highest scores. Geology, EC, lithology and DO are the top scorers for CatB. This proves that multiple parameters play a part in categorizing the water. In order to test this hypothesis, the weighting methods were also tested for physico-chemical parameters alone and physico-chemical, air and meteorological parameters. Table 5 shows the results for the selected parameters for the top performing algorithms where the highest accuracy achieved was up to 82%. The dependencies of different parameters on the water quality can be seen with the inclusion of all 22 parameters that gave a 99% accuracy rate.
The results of the classification achieved with the top two performing feature weighting techniques on the test set are given in Table 6. CatB weights classified the test set in four classes, i.e., bad (82.7%), medium (16%), poor (1.2%) and good (0.005%), whereas the LGBM classified the test data in two classes, i.e., bad (82.6%) and medium (17%). The test data was also classified using the traditional NSFWQI method. The weights in the NSFWQI were assigned based on the selection of the physico-chemical parameters. Thus, the NSFWQI weights need to readjusted for the current physico-chemical parameters used. The results of the classification of the test set using EWQI (CatB weighting), NSFWQI (without weight updates) and NSFWQI (with weight updates) is shown in Figure 7. Figure 7 represents the classification of samples with EWQI, NSFWQI (with weights updated) and NSFWQI (without weights updated). It shows the number of samples that fall in each class i.e., poor, bad, medium and excellent. It can be seen that with NSFWQI, more than 75% of the data remains unclassified, even with weight updates. This, in turn, proves that the results achieved with the EWQI are reliable and accurate.     In addition to the 40% test data that are acquired for the monsoon months (June-September), 4998 sample points are collected from each non-monsoon or winter season of the year 2020. These data are used to further analyze the performance of the EWQI and are compared with the traditional NSFWQI. Table 7 shows the results for the test sets of six months, i.e., January, February, March, April, November and December 2020. The NSFWQI failed to classify the test subsets for these months of 2020. The parameters used for NSFWQI are the seven physico-chemical water quality parameters. The test sets for the year 2020 had some missing parameters, such as for November and December the meteorological parameters are missing, for January 2020 CH 4 is missing. However, even with the missing parameters, the EWQI weights are applicable and have classified the data which is in contrast to the application of NSFWQI. Moreover, it can be seen that with EWQI throughout January to March, the classified samples have a 55:45 ratio for medium to bad class. However, for April this ratio shifted to 90:10. For November, the ratio further shifted to 45:55 and finally for December, the ratio wass 10:90. The 10% to 90% ratio of medium to bad class indicates the river water pollution that occurs due to anthroprogenic activities during winter [58]. This shows that the seasonal variations are visible with the EWQI method that is trained on the data collected for just monsoon months. Figures 10 and 11 display the test samples for January and February using the EWQI (LGBM) method. These classification maps are produced in ArcMap after applying post classification smoothing [59] using spatial analyst tools.     Although EWQI has all six levels of water quality like the NSFWQI method, the Rawal stream network does not contain samples that fall in all six classes as seen with the acquired data. Thus, this is a limitation of the study, and in future, other lakes and watersheds can be investigated with the EWQI to show samples that belong to all the classes.

Conclusions
The physico-chemical, hydrological and topographical air pollutants and meteorological parameters were extracted from S2-MSI L1C, SRTM DEM, S5P-L2 and ERA5-CRP, respectively, for the Rawal stream network for the monsoon months (June to September) for the years 2018 to 2022. The water quality was assessed using WQI methodology to rank the water bodies. However, the application of the WQIs on water samples is a biased approach as each index is built specific to certain locations or water types or is sensitive to specific parameter concentrations or is dependent on the weights assigned. Such limitations make the traditional WQIs unsuitable for application on any general water body. Thus, this study aimed to determine the impact of other natural factors in the environment to understand and classify the water quality using an enhanced water quality index method. An enhanced indexing methodology is proposed that, compared to the traditional or state-of-the-art WQI, is based on a multitude of parameters and machine learning techniques. The first step of building the EWQI method was the parameter selection, where 22 physico-chemical, hydrological and topographical air pollutants and meteorological parameters were selected, i.e., lithology, geology, soil type, wind speed, air temperature, CO, NO 2 , O 3 , DO, TDS, etc. Next, the sub-index calculation was performed using the min-max normalization technique to transform the data in the 0 to 100 range. The third and most crucial step was assigning weights where the train data was clustered using the Elbow method to find the K value. The final weights were then calculated on the clustered train data with LGBM and CatB models giving a 99% accuracy. These weights were then assigned to the test data. Once the sub-index and weights were calculated, the sub-indices aggregation took place by applying the formula given in Equation (3). The final step was the classification of the EWQI values using the WHO ranking system.
The conclusions drawn from the analysis of the newly proposed indexing technique are that the use of tree-based LGBM weighting and min-max normalization methods can lead to the accurate classification of the stream network as compared to the traditional NSFWQI. Moreover, the parameters, i.e., physico-chemical and other natural factors such as air pollutants, air temperature, slope, aspect, etc. all play a role in categorizing the water quality where EC, SDD, DO, lithology and geology are given high scores or weights with the feature weighting methods LGBM and CatB. Contrary to the NSFWQI, the missing parameters do not influence the classification of the water body using the EWQI. Even with more than five missing parameters for November and December 2020, the classification maps are produced with each sample assigned to a bad, medium or good class. The EWQI works well for all seasons as the seasonal variations can also be observed for January to December where the water quality class ratio shifted from 55:45 to 10:90 ratio for medium to bad class. In contrast, NSFWQI failed to classify the samples. Thus, the new and improved EWQI method will help remove the uncertainties involved in the traditional methods and can contribute to water management planning on a global scale. In the future, the EWQI can be explored further for other water bodies such as Khanpur, Mangla and Tarbela Dam.

Data Availability Statement:
The data may be requested by reaching out to authors through email.

Acknowledgments:
Research and development of this study were conducted in IoT Lab, NUST-SEECS, Islamabad, Pakistan and at the Sheila and Robert Challey Institute for Global Innovation and Growth at North Dakota State University, USA.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: