Machine Learning-Enhanced Monitoring and Assessment of Urban Drinking Water Quality in North Bhubaneswar, Odisha, India

Samal, Kshyana Prava; Thakur, Rakesh Ranjan; Panda, Alok Kumar; Nandi, Debabrata; Pati, Alok Kumar; Pegu, Kumarjeeb; Đurin, Bojan

doi:10.3390/limnolrev25030044

Open AccessArticle

Machine Learning-Enhanced Monitoring and Assessment of Urban Drinking Water Quality in North Bhubaneswar, Odisha, India

by

Kshyana Prava Samal

¹

,

Rakesh Ranjan Thakur

²

,

Alok Kumar Panda

³

,

Debabrata Nandi

⁴

,

Alok Kumar Pati

⁵

,

Kumarjeeb Pegu

⁶

and

Bojan Đurin

^7,*

¹

Centre for Water Research and Climate Change, School of Civil Engineering, Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar 751024, Odisha, India

²

Centre of Remote Sensing and Disaster Management, School of Civil Engineering, Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar 751024, Odisha, India

³

Environmental Science Laboratory, School of Applied Sciences, Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar 751024, Odisha, India

⁴

Department of Remote Sensing and GIS, Maharaja Sriram Chandra Bhanja Deo University, Baripada 757003, Odisha, India

⁵

Department of Computer Application, Siksha ‘O’ Anusandhan Deemed to be University, Bhubaneswar 751030, Odisha, India

⁶

Department of Law, National Law University Odisha, Cuttack 751013, Odisha, India

⁷

Department of Civil Engineering, University North, 42000 Varazdin, Croatia

^*

Author to whom correspondence should be addressed.

Limnol. Rev. 2025, 25(3), 44; https://doi.org/10.3390/limnolrev25030044

Submission received: 13 April 2025 / Revised: 9 August 2025 / Accepted: 12 August 2025 / Published: 12 September 2025

Download

Browse Figures

Versions Notes

Abstract

Access to clean drinking water is crucial for any region’s social and economic growth. However, rapid urbanization and industrialization have significantly deteriorated water quality, posing severe pollution threats from domestic, agricultural, and industrial sources. This study presents an innovative framework for assessing water quality in North Bhubaneswar, integrating the Water Quality Index (WQI) with statistical analysis, geospatial technologies, and machine learning models. The WQI, calculated using the Weighted Arithmetic Index method, provides a single composite value representing overall water quality based on several key physicochemical parameters. To evaluate potable water quality across 21 wards in the northern zone, several key parameters were monitored, including pH, electrical conductivity (EC), dissolved oxygen (DO), hardness, chloride, total dissolved solids (TDSs), and biochemical oxygen demand (BOD). The Weighted Arithmetic WQI method was employed to determine overall water quality, which ranged from excellent to good. Furthermore, Principal Component Analysis (PCA) revealed a strong positive correlation (r > 0.6) between pH, conductivity, hardness, and alkalinity. To enhance the accuracy and reliability of water quality assessment, multiple machine learning models Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Naïve Bayes (NB) were applied to classify water quality based on these parameters. Among them, the Decision Tree (DT) and Random Forest (RF) models demonstrated the highest precision (91.8% and 92.7%, respectively) and overall accuracy (91.7%), making them the most effective in predicting water quality and integrating WQI, machine learning, and statistics to analyze water quality. The study emphasizes the importance of continuous water quality monitoring and offers data-driven recommendations to ensure sustainable access to clean drinking water in North Bhubaneswar.

Keywords:

WQI; machine learning; sustainable development; urbanization; water quality assessment; GIS

1. Introduction

Water scarcity and inadequate sanitation hinder economic growth and public health, affecting people globally who lack access to clean drinking water and are vulnerable to waterborne diseases, particularly in developing nations [1,2]. The population exceeds two billion without access to clean drinking water, while the spread of waterborne diseases leads to high disease rates in developing nations [3,4]. The rapid industrial growth and urban development in North Bhubaneswar have worsened water contamination, as resources from the domestic sectors, agriculture, and industries have entered the environment [5,6,7]. Water pollution, which affects one and a half million people throughout the city, exists because the river and water resources are contaminated [8,9]. Accomplishing a water quality assessment requires the development of an organized implementation strategy that addresses existing challenges in implementation [10]. Combining the Water Quality Index (WQI) with geo-statistical modeling and contemporary assessment systems enables proper observation and enhancement monitoring of water quality [11,12]. Poor management of sanitation practices and the release of pollutants lead to severe environmental damage, endangering public health [13]. Severe water resource destruction occurs in the city due to industrial contamination, which combines with household sewage spills, while inadequate water purification facilities and agricultural runoff create additional problems [14]. Water sustainability and proper infrastructure are crucial for establishing reliable drinking water systems and ensuring environmental safety within the city [15].

Multiple researchers have used the Weighted Arithmetic Water Quality Index (WAWQI) method to study urban water quality and surface waters, identifying contaminated areas as they developed protection strategies for delivering safe water resources [16,17,18]. Scientists observe a growing emergency due to population growth, declining water quality, and shrinking water resources. Pipe water is contaminated with increasing levels of chemical and organic pollutants, posing a significant risk to human health. According to the availability of surface water, contaminated water must be treated before being supplied to the distribution network or pipeline system [19].

This study utilized Geographic Information Systems (GIS) to analyze the spatial distribution of physicochemical parameters and identify patterns in water quality across the study area. By integrating GIS with hydro-geochemical characteristics and statistical analysis, we aim to develop a robust framework for mapping and monitoring water quality assessments [20].

Hydro-geochemical characteristics, when combined with statistical analysis, form a practical framework for mapping and monitoring water quality [21]. Research today primarily focuses on the quality of piped water in central Bhubaneswar; however, studies on water quality in the urban supply are scarce for the northern regions of the city. This comprehensive assessment provides a deeper understanding of water issues throughout the city and identifies essential requirements for implementing sustainable management systems. In this context, the present study aims to assess the quality of treated drinking water supplied through the municipal distribution network (tap water) across 21 wards of North Bhubaneswar. The purpose is to identify spatial variations and potential contamination zones in the supply water using a combination of the WAWQI, Principal Component Analysis (PCA), and machine learning classification models. The findings aim to support policymakers and urban planners in implementing targeted improvements to water supply infrastructure and public health safety.

Scientists have noted the increasing risk of water pollution resulting from industrial growth and inadequate waste disposal systems [22]. The accelerating population growth and increasing industrial activities in North Bhubaneswar have resulted in severe water pollution problems, which demand advanced water quality evaluation methods. Present water quality assessment systems fail to track pollution patterns across different testing areas, neglect the effects of multiple pollutants on one another, and thus require updated assessment methods [23]. This research develops a high-end water quality assessment system that integrates WQI analysis techniques with geographic statistical methods. The study employs a WQI-based geostatistical framework to investigate temporal and spatial variations in water quality and identify key factors affecting water quality, thereby supporting informed decision-making [24]. The research revolutionizes water quality assessment by achieving these objectives, thereby better tracking environmental conditions while simultaneously lowering public risks and establishing sustainable community development in North Bhubaneswar.

According to the research findings, approximately 1.5 million people from this population will benefit from applying the information to policy creation, while developing enhanced water management strategies and advancing future investigations.

Population growth has created extreme surface water demands in recent decades [25]. Surface water resources are under severe stress due to the combined effects of urban growth and pollution [26]. Irrigation resources are diminishing, and water conditions are worsening due to population growth and the concurrent impact on industrial and agricultural development, as well as dairy farming sectors and industrial production [27,28]. New economic development patterns have a profoundly destructive effect on water quality, making it progressively more challenging to maintain proper water supply operations [29]. The rapidly growing population of Odisha, like that of other states, has made the provision of human drinking water a priority as health awareness continues to expand [30]. The expected population expansion necessitates the improvement of water distribution systems to meet future human water needs. Consuming contaminated water through drinking or contact with the substance results in various serious health problems, including cholera, hepatitis A, dysentery, and diarrhea.

Water distribution network points underwent an evaluation through the assessment of pH levels, chloride and hardness measurements, followed by oxygen content measurements. At the same time, tests analyzed alkalinity, electrical conductivity (EC), total dissolved solids, and biochemical oxygen demand exams [31]. All water specimen testing was conducted at the Environmental Engineering Laboratory, KIIT DU. The PCA data simplification method helped maintain essential data elements because its parameter correlation checks functioned adequately. The findings aligned with the outcomes of earlier research that employed PCA to identify key components in water quality analysis [32]. Scientists choose PCA as their decision-making tool because this method enables them to develop a standardized approach to organize water dataset variables. The leading causes of water quality depletion stem from aged pipeline materials and industrial waste generated by residential zones, institutional buildings, and landfills. Through ArcGIS software, meaningful maps were generated that demonstrated the areas where water quality monitoring requires attention in each ward of North Bhubaneswar. This spatial analytical tool facilitates the simple identification of priority areas needed by policymakers. This research analysis demonstrates varying ward positions using WQI data and results from PCA analysis. Achieving universal access to sustainable drinking water requires continuous monitoring methods through modern infrastructure, alongside dedicated plans to meet Goal 6 of the Sustainable Development Goals [33].

This research pioneers an integrated approach to water quality assessment in North Bhubaneswar, leveraging WQI analysis and geospatial statistical methods to identify key factors affecting water quality. By harnessing GIS and PCA, spatial maps of water quality parameters were generated, which will aid in making policy decisions for sustainable water management.

Some researchers in India have examined land use changes and hydrological impacts while, on a global scale, studies have been conducted to analyze the effects of climate variability on water resources [34]. This research advances the field by combining geospatial analysis with machine learning to investigate complex interactions among land use, climate, and hydrology in a specific watershed. This study offers nuanced insights into system dynamics, enabling the development of more effective water resource management strategies tailored to India’s regional needs.

2. Materials and Methods

2.1. Study Area

The study area is situated in the northern section of the Bhubaneswar Municipality Corporation (BMC) administrative unit of Khordha District, Odisha, between latitudes 20°16′ and 20°24′ N and longitudes 85°45′ and 85°54′ E, with an area of approximately 255 km², as shown in Figure 1. The terrain is undulating with an average elevation of 46 m above sea level [35]. The rivers that make up the river system are Kuakhai in the east and Daya in the south. Streams like JhumkaNala and GanguaNala supply water to the Daya River on the southern side of the city. It has various freshwater lakes, including historic Bindusagar and Vanivihar Lake [36]. It experiences a humid tropical climate with distinct seasons: winter, summer, and rainy season, and the summer season (April–May) is very hot and humid, with day temperatures ranging from 31 to 45 °C, whereas the winter season (December–January) is moderately cold, with temperatures fluctuating between 12 and 30 °C. The average annual rainfall in the area has been between 1450 and 1550 mm for the last 10 years. The monsoon season (July–September) accounts for the majority (85%) of the total annual rainfall, with the highest amounts recorded in August. The topography predominantly controls the floodplains and alluvial soils in the southeast, as well as the hilly terrains in the northwest [37]. The city’s population density is approximately 2131 persons per square kilometer, as per the 2011 Census (Bhubaneswar Municipal Corporation, 2011) [38]. The Indian government has chosen Bhubaneswar as part of its smart city development strategy [39]. In Figure 1, numbered labels indicate the ward numbers of North Bhubaneswar, which are the primary administrative units considered for water sampling in this study. These ward boundaries were overlaid to depict the spatial distribution of the 21 selected wards where water samples were collected. Additionally, the figure highlights several key locations within the study area, including major urban zones such as educational institutions, healthcare centers, commercial hubs, high-density residential areas, and rapidly developing construction sites. Specifically, these include Apollo Hospital, KIIT Deemed to be University, Utkal Hospital, Infocity, Esplanade Mall, and the International Kalinga Stadium, all of which are critical hotspots with significant human activity, infrastructure load, and increased risk of pipeline stress or contamination. These locations were intentionally highlighted because a substantial portion of the water samples were collected around these zones, where risks of pipeline leakage, contamination, and infrastructural stress are relatively higher.

Leak points were identified through a thorough inspection, revealing damages and fissures caused by contractor negligence during installation and pipe connections in the prone wards of the study area. Further, the majority of the leaks were found at coupler, elbow, valve, and saddle clamp connections. Backfilling and bedding sand with significant and sharp stones results in scratching and puncturing of the pipes and fittings [40]. The fluctuation in dissolved oxygen (DO) is directly proportional to the corrosion rates caused by redox couple reactions within the pipeline system [41]. DO consumption refers to the use of pipeline materials, such as copper, steel, and iron [42]. Pipe materials play a critical role in the corrosion and degradation of water quality in water distribution systems [43]. The infrastructural vulnerability is closely linked to the water quality indicators analysed in this study. Parameters such as DO, EC, and hardness are directly impacted by leaching and contamination resulting from pipeline damage. These considerations are reflected in both the WQI calculation and the multivariate analysis, where such parameters significantly contributed to the observed spatial variation. In light of the above, the study focused on Bhubaneswar’s northern region, which is home to several educational institutions and significant new construction developments. In Figure 2, the detailed process is described.

2.2. Quality Parameters

This study considers the following water quality parameters: pH, DO, EC, Alkalinity, Hardness, Chloride, Total Dissolved Solids (TDSs), and Biochemical Oxygen Demand (BOD). All the samples were withdrawn from the municipal tap water interfaced with the surface water treatment system of Bhubaneswar Municipal Corporation (BMC). The northern part of the Bhubaneswar city was studied, which includes 21 wards (Ward Nos. 1 to 14, 16 to 21, and 26). The sites for collection were chosen based on the risk of surface and pipe water pollution, which included industrial areas, commercial markets, densely populated regions, construction sites, and schools and universities.

A total of 105 water samples (5 per ward) were collected in 1-L, pre-cleaned polyethylene bottles, stored at 5 °C, and transported to the Environmental Engineering Laboratory at KIIT DU by following the Indian Standard and WHO guidelines [44]. All the results were compared with the values mentioned in Table 1. The laboratory practices were conducted as outlined in the Standard Methods for the Examination of Water and Wastewater ensuring accuracy through the use of sterilized glassware and calibrated instruments [45]. Measurements of pH and DO were performed using a standard multiparameter analyzer. EC and TDSs were assessed using a calibrated conductivity meter. Alkalinity and hardness were determined through acid-base titration and ethylenediaminetetraacetic acid (EDTA) titration, respectively. Chloride concentration was measured using argentometric titration, and BOD was analyzed by incubating the samples at 20 °C for 5 days. The WQI was calculated using the mean values of each parameter of the five samples per ward. The samples were collected during the winter season of 2023–2024 (November–February) to reflect dry-season conditions.

2.3. Calculation of WQI

The WQI was calculated ward-wise using the average values of the parameters for a particular location. The total parameters considered for the WQI computation were eight, in addition to pH, DO, conductivity, alkalinity, hardness, chloride, TDSs, and BOD; lead, copper, and Zn were excluded from the WQI computation since they appear at a concentration level well within the desirable limits for the supply water.

The drinking water quality standard, approved by the BIS in 2012 and the WHO in 2011, has been used to calculate the WQI. The WQI approach is an effective tool that helps inform the public and policymakers about the quality of water [46]. The WQI is an effective method for assessing and communicating the general status of total water quality, supporting informed choices in managing the resource and health programs [47]. It is an effective tool that enables the incorporation of water metrics believed to be essential for water quality, as outlined in Table 1, which presents the World Health Organization (WHO) guideline limits for various water quality parameters used to evaluate the water quality in our study. Since it is considered the most suitable option in the given situation, the WQI, computed using the weighted arithmetic index method, is employed in this paper to evaluate the impact of contaminants on supply water [48].

The WQI is given in Equation (1):

W Q I = \frac{\sum_{i = 1}^{n} q_{i} w_{i}}{\sum_{i = 1}^{n} w_{i}}

(1)

where

q_{i}

is the quality rating (sub-index) of the i^th water quality parameter, and

w_{i}

is the unit weight of the i^th water quality parameter 1. In addition,

q_{i}

, which relates the value of the parameter in polluted water to the standard permissible value, is obtained as follows:

q_{i} = 100 (\frac{v_{i} - v_{i 0}}{s_{i} - v_{i 0}})

(2)

where

w_{i}

is the estimated value of the i^th parameter,

v_{i 0} i s

the ideal value of the i^th parameter, and

s_{i}

is the standard permissible value of the i^th parameter; in most cases,

v_{i 0}

is 0 except for pH and DO. For pH,

v_{i 0}

is 7 and, for DO,

v_{i 0}

is 14.6 mg/L. The unit weight (

w_{i}

), which is inversely proportional to the values of the recommended standards, is obtained, followed by Equations (2) and (3):

w i = \frac{k}{s i} Where k = \frac{1}{\sum_{i = 1}^{n} \frac{1}{s_{i}}}

(3)

2.4. Machine Learning Models

In the present study, supervised machine learning (ML) algorithms were employed to classify water samples based on their WQI category. The models were initially trained using eight physicochemical parameters measured from municipal tap water samples: pH, EC, DO, alkalinity, hardness, chloride, total dissolved solids (TDSs), and biochemical oxygen demand (BOD). These variables were selected for their regulatory relevance and importance in water quality assessment. To further enhance predictive accuracy and model robustness, the final model development incorporated a total of 17 input features grouped into three categories based on their relevance to groundwater quality and fluoride behavior in aquifer systems. The first category comprises the eight core physicochemical parameters, along with additional indicators, such as bicarbonate (HCO₃⁻), sulfate (SO₄²⁻), calcium (Ca²⁺), magnesium (Mg²⁺), sodium (Na⁺), potassium (K⁺), and fluoride (F⁻), which reflect geochemical interactions that affect water quality. The second category comprises spatial features, latitude, longitude, and well depth, which account for locational heterogeneity and aquifer depth variability. The third category includes temporal and metadata features: year of sampling, block name, and WQI class label. The year variable enables modeling of temporal trends, while the block name, encoded categorically, captures local administrative differences. The WQI class label served as the target output for classification tasks. The dataset was labeled based on these WQI classifications (e.g., good, excellent), and model performance was evaluated using 5-fold cross-validation to assess accuracy, precision, recall, and F1 score. Before model training, the dataset underwent several preprocessing steps to ensure quality, consistency, and compatibility with machine learning algorithms. Continuous variables were standardized to have a mean of zero and a variance of one, enabling uniform feature scaling and improving model convergence across various algorithms. Categorical variables, including the block name and WQI class label, were one-hot encoded to convert categorical data into a numerical format suitable for machine learning (ML) models. For features with missing values constituting less than 2% of the dataset, median imputation was applied to maintain data integrity without introducing bias. The resulting preprocessed dataset formed the input matrix (X). At the same time, the output vector (y) varied depending on the modeling task: fluoride concentration was used as the input for regression tasks, and the WQI class label served as the target variable for classification tasks.

Before model training, we applied a rigorous data preparation pipeline to ensure consistency and minimize bias. First, we performed an initial quality check to remove duplicate records and detect outliers using the interquartile range rule. Features with missing values (<2% of all entries) were imputed using the median, thereby preserving a central tendency without skewing the distributions. All continuous inputs (e.g., pH, EC, DO, alkalinity, hardness, chloride, TDS, BOD, HCO₃⁻, SO₄²⁻, Ca²⁺, Mg²⁺, Na⁺, K⁺, F⁻, well depth) were then standardized to zero mean and unit variance to harmonize scales and accelerate convergence of gradient-based algorithms. Categorical fields (block name, year, and WQI class label) were converted via one-hot encoding to create binary indicator variables, enabling tree and distance-based learners to effectively leverage locational and temporal metadata.

2.4.1. Logistic Regression (LR)

For binary and multi-class classification tasks, the statistical and machine learning approach known as logistic regression (LR) is employed. It utilizes the logistic (sigmoid) function to estimate probabilities, thereby representing the connection between a dependent variable and one or more independent variables. The sigmoid function is suitable for probability prediction, as it converts any real-valued input into a value between 0 and 1. Logistic Regression classifies data according to a decision threshold, usually 0.5, and produces the probability of the target class. It is widely used due to its simplicity, interpretability, and effectiveness in linearly separable data applications across various fields, including medicine, finance, and the social sciences.

P (Y = 1| X) = \frac{1}{1 + e^{- (β_{0} + β_{1} X_{1} + β_{2} X_{2} + \dots β_{n} X_{n})}}

(4)

where

P (Y = 1| X)

is the probability of the positive class,

β_{0}, β_{1,} β_{2}, {\dots β}_{n}

are the regression coefficients,

X_{0}, X_{1,} \dots . X_{2},

are the input features, and e is Euler’s number.

2.4.2. Decision Tree (DT)

A decision tree is a supervised learning technique that has been applied to tasks involving regression and classification. It is a rule-based model that creates a tree-like structure by dividing the dataset into subsets according to the value of the input attributes. To optimize information gain, the algorithm utilizes impurity measurements, such as the Gini Index or Entropy, to determine which feature at each internal node is most suitable for splitting the data. Recursively, the process continues until a stopping criterion, such as the maximum depth or the minimum number of samples per leaf, is met. Decision trees can efficiently handle both numerical and categorical data, making them easy to understand.

Gini Index:

G i n i = 1 - \sum_{i = 1}^{c} p_{i}^{2}

(5)

Entropy (for Information Gain):

E n t r o p y = - \sum_{i = 1}^{c} p_{i} \log_{2} p_{i}

(6)

where

p_{i}

is the probability of class

_{i}

, and cis the number of classes. The best split is chosen by maximizing the Information Gain.

2.4.3. Random Forest

An ensemble learning method called Random Forest is applied to problems involving regression and classification. During training, it constructs numerous decision trees and aggregates their results to provide more reliable and accurate predictions. To provide variety among the trees, each one is trained using a different subset of the features and data. Either majority voting in classification or averaging the outcomes in regression yields the final forecast. This method enhances generality and lessens overfitting, which is prevalent in single decision trees. Random Forest is widely used due to its robustness, high accuracy, and ability to handle large datasets with missing values:

\hat{y} = \frac{1}{N} \sum_{i = 1}^{N} T_{i} (X)

(7)

where

\hat{y}

is the final prediction, N is the number of trees, and

T_{i} (X)

represents the prediction of the i^th decision tree.

2.4.4. Support Vector Machine (SVM)

One practical supervised learning approach widely used for classification problems is the Support Vector Machine (SVM). It operates by determining the hyperplane that divides data points from multiple classes with the largest margin. This margin is the separation, expressed in terms of support vectors, between the hyperplane and the closest data points from each class. The goal of SVM is to increase this margin, thereby enhancing the model’s capacity to generalize to unseen data. SVM can handle both linear and non-linear classification problems using kernel functions, making it a versatile and practical approach, especially in high-dimensional spaces and complex datasets. The hyperplane Equation is as follows (8):

ω . X + b = 0

(8)

where w is the weight vector, X is the input vector, and b is the bias term. The optimization problem for SVM is Equation (9) subject to

y_{i} (ω . X_{i} + b) \geq 1, \forall_{i}

.

\binom{m i n}{ω, b} \frac{1}{2} {‖ω‖}^{2}

(9)

SVM uses the kernel trick to map data into a higher-dimensional space for non-linearly separable data.

2.4.5. K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN) is a non-parametric, instance-based learning algorithm used for classification and regression. A data point is classified according to the majority vote of its k closest neighbors in the feature space. Although various metrics can be employed, Euclidean distance is commonly used to quantify the distance between points. KNN does not make any assumptions about the underlying data distribution, making it simple and versatile. It is sensitive to the choice of *k* and the scaling of features. Despite its simplicity, KNN can be effective for pattern recognition and classification tasks.

2.4.6. Distance Metric (Euclidean Distance)

d (X, Y) = \sqrt{\sum_{i = 1}^{n} {(X_{i} - Y_{i})}^{2}}

(10)

X and Y are two data points, and n is the number of features. The class is determined as follows:

\hat{y} = \arg \binom{m a x}{c} \sum_{i = 1}^{k} 1 (y_{i} = c)

(11)

where c is a class label, and 1 is an indicator function.

2.4.7. Naive Bayes (NB)

Naive Bayes is a fast and simple probabilistic classification algorithm based on Bayes’ Theorem. It assumes that the features are conditionally independent given the class label, which simplifies computation. Despite this strong independence assumption, Naive Bayes often performs well in real-world applications, especially with high-dimensional data. It calculates the posterior probability of each class given the input features and assigns the class with the highest probability. Naive Bayes is widely used in text classification, spam detection, and sentiment analysis due to its efficiency, scalability, and good performance with relatively small training data. Bayes’ Theorem is as follows:

P (C| X) = \frac{P (C| X) P (C)}{P (X)}

(12)

where P(C∣X) is the probability of class C given features X, P(X∣C) is the likelihood of features given class C, P(C) is the prior probability of class C, and P(X) is the probability of features X. For Gaussian Nave Bayes (GNB), the likelihood follows a normal distribution:

P (X_{i}| C) = \frac{1}{\sqrt{2 π σ^{2}}} e^{- \frac{{(X_{i} - μ)}^{2}}{{2 σ}^{2}}}

(13)

where

μ

and

σ^{2}

are the mean and variance of feature Xi in class C.

3. Results

The findings of the above study followed the procedures under the APHA 2022 protocols, and the observed values were cross-verified against the WHO and BIS drinking water standards. The laboratory test results for different physicochemical parameters are presented in the box plot graphs shown in Figure 3A–H. The box plots depict the range of parameters from minimum to maximum values. Around 32.0% of areas deviated from EC. Similarly, SameiGadia, Mancheswar Village, and Chakeisiani Area (all located in Ward 5) exhibited excessive hardness levels, attributed to natural geological processes, industrial discharge, and urban runoff, which led to pipe scaling and reduced soap effectiveness. 13.6% of areas deviated from the hardness standard. Parameters like chloride, TDS, BOD, and alkalinity remained within permissible limits.

3.1. Illustration of Water Quality

The assessment of water quality across the 21 selected wards was carried out using eight key physicochemical parameters: pH, DO, EC, Total Dissolved Solids (TDSs), Alkalinity, Hardness, Chloride, and Biochemical Oxygen Demand (BOD). Descriptive statistics for each parameter, including minimum, maximum, mean, and standard deviation values, were used to evaluate water quality across the study area, as discussed below.

3.1.1. pH

As shown in Figure 3A, the maximum pH value was recorded in Ward No. 26 (Sranapalli) at 8.5, while the minimum was observed in Ward No. 20 (Sriram Nagar) at 5.6. All other surface water samples were found within the acceptable pH range of IS 10500 ranging from 6.5 to 8.5.

3.1.2. DO

The variation in DO levels across different wards was reported to show that Ward No. 20 (Tarini Nagar) had the highest recorded DO value at 8.4 mg/L. In comparison, the lowest value was noted in Ward No. 2 at 7.3 mg/L, as illustrated in Figure 3B. According to IS 10500, the acceptable range for DO is 6.5 to 8 mg/L. Although Ward No. 20 initially showed slight deviation, it is now approaching the standard range, indicating a return to safe levels.

3.1.3. EC

As shown in Figure 3C, the EC values of gravity-fed water across all study locations ranged from 67.7 to 449.9 µS/cm. These values were generally within the acceptable limits set by the WHO of 400 µS∙cm⁻¹, with exceptions observed in specific areas including Ward No. 5 (Satya Nagar &Mancheswar Village), Ward No. 10 (IDCO Colony, Mancheswar Industrial Estate, Votpada Village), Ward No. 14 (NiladriVihar S 5), Ward No. 16 (NALCO Nagar, Mayfair area), and Ward No. 20 (Nilachakra Nagar). The higher EC values observed in Wards 5, 10, 14, 16, and 20 indicate possible contamination from dissolved salts, which could originate from industrial effluents, natural groundwater salinity, or seepage of wastewater.

3.1.4. Alkalinity

It was reported that the alkalinity levels in the northern zone of Bhubaneswar, as illustrated in Figure 3D, varied significantly, with the SPM Park area in Ward No. 17 showing the lowest level at 16 mg/L, and the Hanspal Village area in Ward No. 4 registering the highest at 128 mg/L. This variation was attributed to differences in the concentration of minerals, particularly carbonates and bicarbonates, present in the water across different wards. The above results for Alkalinity across the study area are within the permissible limit IS 10500.

3.1.5. Hardness

It was reported that water hardness in the study area was primarily attributed to elevated concentrations of alkaline earth ions in water, specifically calcium (Ca²⁺) and magnesium (Mg²⁺) [49,50]. According to Figure 3E, the hardness levels ranged from 13.1 mg/L to 262.4 mg/L. It was noted that high hardness in the supply water appeared to be influenced by the proximity of certain areas to sewage drains.

3.1.6. Chloride

The study revealed that the chloride ion concentration in the supply water ranged from 21.3 mg/L to 87.0 mg/L, as depicted in Figure 3F. It was observed that the Metro Home Area of Ward No. 6 had the lowest chloride concentration, while Hanspal Village in Ward No. 4 registered the highest. Nevertheless, all the wards assessed exhibited chloride levels within the permissible range. The authors stated that chloride (Cl⁻) occurs naturally in the environment due to sources such as suspended salt particles, soil porosity and absorbency, residual food waste, and farm manures used in agricultural fields.

3.1.7. TDSs

It was observed that the Total Dissolved Solids (TDSs) values in the supply water ranged from 44.0 mg/L to 312.2 mg/L, as shown in Figure 3G. The highest value was recorded in Ward No. 10, specifically in Votapada Village (312.2 mg/L). This elevated value was suggested to have resulted from higher ambient temperatures that facilitated weathering processes, enhanced ion exchange capacity, promoted desorption, and accelerated the dissolution of minerals. The increased temperature also appeared to elevate both pH and EC in the area. Since Ward No. 10 recorded the highest values for both pH and EC, the authors inferred a direct relationship between TDS and these parameters.

3.1.8. Biochemical Oxygen Demand

The study observed that the BOD (Biochemical Oxygen Demand) values ranged from 0.6 mg/L to 5.0 mg/L, as illustrated in Figure 3H. It was reported that the highest BOD concentration was measured in Ward No. 3 (KananVihar Phase-1), while the lowest was recorded in Ward No. 6 (Pragati Vihar). The authors attributed the elevated BOD levels to increased biological activity, which they explained is often linked to warmer temperatures. They emphasized that BOD is a crucial parameter in stream pollution control and is particularly important for regulating organic load to maintain the required levels of DO. The rise in BOD, particularly in Ward No. 3, was again linked to elevated biological activity under warmer conditions.

3.1.9. Identification of Contaminated Areas

The study reported that deviations in key water quality parameters, namely pH, DO, EC, and hardness, were spatially mapped and are depicted in Figure 4a–d. These maps highlight areas of significant environmental and public health concern. It was noted that the analysis was conducted using ArcMap 10.5 version software. According to the findings, 12.6% of the sampled areas exhibited pH values outside the acceptable limits defined by IS 10500. The authors identified that the exceedance in pH was primarily observed in regions such as Chirakhol Toil Slum, PrashantiVihar, and KananVihar Phase 1. These anomalies were attributed to sewage contamination and informal waste disposal practices. Similarly, high DO values were detected in localities including Sikharchandi Nagar, Adarsha Vihar, and NiladriVihar. These were explained as being the result of anthropogenic pressures, such as construction activities, sewage infiltration, and surface runoff, in urbanized regions. Overall, 21.4% of the areas were found to deviate from standard DO levels.

It was observed that the WQI for each ward in the northern zone of Bhubaneswar was calculated using the weighted arithmetic index method [51]. The WQI value for Ward No. 1 is presented in Table 2. According to the study, detailed water quality analysis results are tabulated in Table 3, and Table 4 contains the WQI values for all wards, based on predefined standards. According to these findings, illustrated in Figure 5, the WQI values fell within a range indicating that the water quality was not only satisfactory but also suitable for drinking purposes. It is reported that 28.6% of the water samples fell within the ‘excellent quality’ category, characterized by WQI values ranging from 0 to 25, as shown in Table 4. Additionally, 71.4% of the samples were categorized under ‘good quality’, with values ranging from 26 to 50. These results were interpreted as evidence that the water quality in Bhubaneswar has remained largely uncontaminated to date.

It was also noted that deviations in specific parameters, such as pH, EC, DO, and hardness, were mapped using ArcGIS and visualized in Figure 4a–d. These anomalies were attributed to likely outcomes of aging pipeline infrastructure and corrosion within the distribution network, as reported by the Public Health Engineering Department (PHED) of Bhubaneswar.

Further, it was indicated that in the ArcGIS maps (Figure 4a–d), red circular dots were used to mark the wards, where water quality parameter values surpassed critical thresholds: for instance, pH values outside the acceptable range of 6.5–8.5, EC values exceeding 300 µS/cm, and hardness levels greater than 200 ppm. Pink numeric labels adjacent to these dots represent the corresponding location serial numbers of the affected areas. It was observed that the deviations of parameters in some wards were confirmed to be linked to sewage contamination and issues within the plumbing systems. PHED was reported to have taken corrective steps by upgrading and modifying the pipeline infrastructure in the affected areas.

3.2. Graphical Presentation of Water Quality Index

The ward-wise WQI was illustrated in Figure 5. It was observed that Wards 5 and 8 had WQI values of 48.7 and 45.2, respectively. Furthermore, they reported that Ward Numbers 1, 2, 17, 20, and 21 recorded WQI values below 25.0, indicating excellent water quality. The remaining wards were noted to fall within the range of 25.1 to 45.0, which corresponds to good water quality.

3.3. Principal Component Analysis

PCA was used to analyze the original monitoring data to reduce computational complexity and identify the influence of various parameters on water quality. The primary objective of PCA was to extract the key representative characteristics of the water environment into a set of independent variables known as principal components. PCA identified correlations among the geochemical data, contributing to the understanding of the depositional climate. In this study, PCA was performed on eight physicochemical water quality parameters across 26 wards of Bhubaneswar city.

As shown in Table 4, the correlation coefficient matrix obtained using OriginPro 2023 software revealed a strong positive correlation between Total Dissolved Solids (TDSs) and Hardness (r = 0.8), commonly attributed to the presence of calcium and magnesium salts. EC and Hardness also showed a strong correlation (r ≈ 0.8), indicating that divalent ions largely influence the ionic strength in water. Moderate positive correlations were observed between pH and EC, pH and Alkalinity, as well as DO and Hardness. These relationships suggest that pH regulation is closely linked to the buffering effect of dissolved carbonates, while DO variability may be influenced by interactions with metal ions and biological activity. These inter-parameter dependencies reflect a complex interplay of hydrogeochemical and anthropogenic factors, highlighting the importance of integrated interpretation of water quality indicators.

Calcium and magnesium salts were found to account for the strong association observed between total dissolved solids (TDSs) and water hardness. A similar positive correlation exceeding 0.58 was noted among pH, conductivity, hardness, and alkalinity.

The scree plot presented in Figure 6 illustrates that the slope became flatter after the third principal component, indicating diminishing variance contributions. The first principal component (PC1) accounted for 48.0% of the variance and showed strong positive loadings from pH (0.4), conductivity (0.5), hardness (0.5), and TDSs (0.5). The second principal component (PC2), which explained approximately 17.00% of the total variance, was primarily influenced by alkalinity (r = 0.6) and showed a negative correlation with DO (r = 0.7). The loadings of PC1 and PC2 indicated the dominance of ionic parameters, likely linked to the presence of calcium and magnesium salts in the water.

As detailed in Table 5, PC1 contributed 48.0% of the total variation, with notable inputs from EC, hardness, and pH, while PC2 explained 17.00% and was dominated by alkalinity and DO. Chloride, TDS, and BOD made comparatively smaller contributions.

The scores plot in Figure 7 identified four groups of wards. Group 1 lay negatively along PC1 and positively along PC2, whereas Group 4 appeared in the opposite quadrant. Group 2 displayed a strong positive association with both components, suggesting a significant ionic influence. Groups 3, 1, and 17 occupied the negative quadrant for both PC1 and PC2, indicating lower concentrations of metal ions. These spatial groupings and component loadings reflect underlying hydro-chemical interactions, supporting policy-oriented interventions as discussed in Section 4.

Model performance was evaluated using 5-fold Stratified Cross-Validation (CV) to ensure balanced testing and result robustness. Metrics such as accuracy, precision, recall, and F1-score were computed for each fold and summarized with their mean and standard deviation. The validation process confirmed consistent model performance across all subsets (Table 6). Among the classifiers tested, Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), k-Nearest Neighbors (KNN), and Naive Bayes (NB), DT and RF achieved the highest accuracy of 91.7%, with RF also leading in precision at 92.7%. SVM, KNN, and NB followed closely, with accuracies of around 89.6%, while LR recorded the lowest accuracy at 87.5%. These findings underscore the strength of tree-based models for water quality classification, offering practical tools for predictive assessment and environmental monitoring.

Hyperparameter Tuning and Function Selection

In Table 7, the hyperparameter settings for all six classifiers were retained at their respective library defaults to establish a baseline for performance comparison. For Logistic Regression, an ℓ₂ penalty was applied with a regularization strength of C = 1.0, solved using the ‘lbfgs’ optimizer in multiclass mode, with max_iter set to 100 and a tolerance of 1 × 10⁻⁴. The Decision Tree model operated using the Gini impurity criterion, with no depth restriction, a minimum of two samples required to split an internal node, and a minimum of one sample per leaf. The Random Forest ensemble consisted of 100 bootstrap-aggregated trees, each constructed using Gini impurity, with no depth limit, and automatic feature selection applied per split. The SVM classifier utilized a radial basis function (RBF) kernel (C = 1.0, γ = scale, tol = 1 × 10⁻³). The K-Nearest Neighbors algorithm evaluated the five nearest neighbors using Euclidean distance (p = 2) and employed a uniform voting strategy. Lastly, Gaussian Naïve Bayes was implemented with no class priors and a variance smoothing parameter of 1 × 10⁻⁹. These configurations provided consistent, reproducible conditions for evaluating and benchmarking classification performance before any further hyperparameter tuning.

4. Discussion

The paper reported that the overall quality of tap water in Bhubaneswar was generally good, with most of the surveyed wards falling in the “good to excellent” range on the WQI. It stated that 28.6% of the areas had been assessed to have excellent water quality, while 71.4% were categorized as having good water quality. According to the findings, this indicated that a safe municipal supply system was generally in place. The variations in pH in Ward No. 26 and 20 were likely caused by factors such as leaching of organic matter, bacterial activity, and occasional use of fertilizers in gardens. Additionally, acidity levels could have resulted from elevated CO₂ due to decomposition of organic matter, adsorption of metal anions, and the presence of non-metallic compounds such as fluoride. Deviations in pH levels were known to irritate the skin and exacerbate health issues such as eczema and gastrointestinal distress [52]. It is observed that the elevated pH levels in wards such as Chirakhol Toli Slum and Prashanti Vihar (ward 1) could be indicative of waste disposal issues or a reaction to alkaline buffering agents within the pipeline network [53].

In addition, the elevated DO levels in Sikharchandi Nagar might suggest natural reaeration; however, in areas with aging pipes, such levels could also indicate organic contamination [54]. It is suggested that factors such as geomorphological characteristics and anthropogenic activities contributed to pollution, which could potentially decrease the DO concentration to levels below those considered essential [55]. It was further explained that particles like slits, clays, and sewage debris absorb sunlight, thereby increasing the water surface temperature and resulting in a reduction of DO levels. This decline in DO was reported to carry significant health implications for aquatic plants and animals, which depend on adequate levels of DO for survival and vital metabolic processes [56].

The elevated EC levels in PHD Colony, Chakeisiani Area, and Satya Nagar (all from Ward 5) reflected the impact of industrial operations, urban development, and improper waste management, which introduced pollutants such as chloride, sulphate, and nitrate into water bodies that could cause cancer in the colon and rectum [57]. It was reported that elevated EC levels in drinking water may have health implications, particularly due to excessive intake of dissolved minerals and salts, and prolonged consumption could increase the risk of hypertension and cardiovascular diseases. The study area shows a significant mineral deposit, water percolating through the soil could dissolve these minerals, thereby increasing alkalinity, and it is confirmed that alkalinity levels remained within the permissible limits set by IS 10500 across all surveyed locations [58].

It is highlighted that excessive water hardness in wards such as Mancheswar Village (ward 5) necessitates water softening [59]. The elevated hardness is most likely influenced by the leaching of calcium and magnesium from corroded pipeline materials, combined with subsurface geogenic contributions, such as the dissolution of carbonate minerals, including calcite and dolomite, present in the underlying geological formations [60]. The study also noted that exposure to water with high chloride concentrations could have adverse effects on the skin, and chloride ions are highly polarizable in water, which can accelerate corrosion reactions [61]. It was found that chloride ions degrade reinforced concrete (e.g., in bridges), contributing to structural ageing, and also cause corrosion in boiler systems by eroding pipes exposed to chloride-rich steam [62].

Temperature-dependent processes, such as ion exchange and desorption, played a crucial role in increasing TDS levels and contributed to an unpleasant taste in drinking water. The study also highlighted that consumption of such pipe water with elevated TDS levels could lead to gastrointestinal discomfort, cardiac problems, and kidney stone formation [63]. The PCA demonstrated marked positive correlations among key hydro-chemical parameters, especially TDS with hardness and EC with hardness [64]. These relationships reflect the contributions of mineral dissolution, ion concentration resulting from human activities, and aging delivery systems that add ions to the supplied tap water [65]. The widespread mapping in ArcGIS proved essential for visualizing contamination hotspots, facilitating the recognition of primary zones for action [66]. This combined collaboration serves as a water-quality management system for metropolitan planners and town planners by providing reliable, scientifically based, and prioritized evidence for the need to improve water supply systems [67].

Machine learning coupled with geo-spatial inputs improves urban tap water quality assessment by converting complex physicochemical data into accurate predictive models [68]. Combining traditional methods (WQI, PCA) with supervised classifiers, the study overcomes static thresholds and fragmented monitoring, achieving over 91% accuracy in classifying water quality across 21 wards [66]. This scalable, real-time framework supports data-driven decisions for the safeguarding of drinking water in rapidly urbanizing North Bhubaneswar [69]. The findings were validated by the Public Health Engineering Department (PHED), Government of Odisha, which recommended addressing encroachment issues in various wards.

Although the study itself does not directly enhance water quality, it provides a thorough assessment of the evaluated water conditions, enabling stakeholders to formulate appropriate water management practices [70]. These findings further justify protective measures, such as continuous monitoring, controlled governance of wastewater, and the maintenance of distribution systems, to ensure the safety of urban water supplies following rapid urbanization [71]. Moreover, in the case study’s lab assessment, although analysis was conducted according to the APHA 2012 guidelines, sample processing protocols at the time referenced newer APHA editions, such as 2017 and 2022, as prospective references for future research aimed at meeting changing international benchmarks [72,73,74].

4.1. Comparison and Implications

Studies conducted in metropolitan cities in India revealed higher WAWQI scores, which proves poorer water quality. On the other hand, the study area falls within the “good” to “excellent” range, which suggests that improved water quality management, addressing environmental issues, and addressing social factors are necessary to achieve a sustainable water supply [75]. The findings have significant implications for decision-makers, environmentalists, academics, and other stakeholders, as the results underscore the need for interventions to address localized contamination hotspots and enhance wastewater treatment efficiency, thereby ensuring an adequate water supply infrastructure [76].

Furthermore, the strong correlation (r = 0.8) between TDS and hardness, as well as their dominant influence in PC1 (48.0% variance), highlights the need for targeted interventions. These parameters, often influenced by calcium and magnesium salts, contribute to pipeline scaling and degradation of water quality. Policymakers should consider regular monitoring of ionic concentrations, implementing localized softening units in high-risk wards, and prioritizing infrastructure audits based on PCA-derived ward groupings.

4.2. Study Limitations

Despite the comprehensive nature of this study, certain limitations provide opportunities for further research. Firstly, the analysis was restricted to the North Zone of Bhubaneswar, which may not fully capture the spatial variability of water quality across the entire city. Extending the assessment to include the Southeast and Southwest zones would enable a more holistic understanding of municipal water supply conditions and help identify zone-specific vulnerabilities. Secondly, the scope was limited to physicochemical parameters, and a detailed microbiological evaluation focusing on bacteriological activity and potential microbial contamination within the water supply network is recommended to complement the current findings [77]. Finally, incorporating advanced statistical approaches, such as multivariate analysis and time-series modelling, could improve data interpretation by uncovering deeper spatial-temporal patterns in water quality variation, thereby supporting more precise and informed decision-making [78].

4.3. Recommendation and Suggestion

The enhancement of waste management practices through segregation, recycling, and proper disposal of industrial and hazardous waste, as well as the upgradation of water treatment facilities with modern technologies, plays a pivotal role. Establishing real-time water quality monitoring systems, strengthening enforcement of water pollution regulations, and conducting regular audits and inspections are also essential. A detailed study is required of specific pollutants, long-term water quality monitoring, and the development of advanced assessment methods integrated with statistical models and machine learning algorithms. Additionally, research priorities should focus on pollutant-specific analysis, health risk assessment, and evaluation of water management policies to inform evidence-based decision-making and ensure a safe and sustainable water supply in Bhubaneswar. Based on PCA findings, introducing ion-specific softening modules in municipal treatment plants and tracking variations in mineral load could further enhance water quality control.

4.4. Conclusion

The assessment of tap water quality revealed that most wards exhibited favorable water quality, ranging from good to excellent classifications. The WQI calculations showed that 28.6% of the regions had excellent water quality, while 71.4% fell under the good water quality range. The analysis of individual water quality parameters revealed localized variations, with specific areas exhibiting high pH values, elevated DO levels, or high hardness levels. These findings provide a comprehensive understanding of the current state of tap water quality in the study area, serving as a valuable resource for stakeholders and informing future water management strategies.

This study highlights the strength of combining geospatial tools and predictive modeling for spatial water quality assessment, examining the complex relationships between climate variability and hydrological responses. Based on these results, we recommend that policymakers and stakeholders take initiatives to mitigate the adverse impacts of water pollution in the prone areas, which will promote sustainable water management and ecosystem resilience.

Author Contributions

Kshyana Prava Samal (K.P.S.) and Rakesh Ranjan Thakur (R.R.T.): conceptualization and writing of the original draft; Alok Kumar Panda (A.K.P.), Debabrata Nandi (D.N.), Alok Kumar Pati (A.K.P.), Kumarjeeb Pegu (K.P.) and Bojan Ðurin (B.Ð.) contributed to all sections; R.R.T. and B.Ð. contributed to supervision and review and editing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data are available from the corresponding author upon reasonable request.

Acknowledgments

The Authors would like to express gratitude for the support of the University North during this research, within the scientific project ‘Hydrological and geodetic analysis of the watercourse-second part’, 2025.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

Satrovic, E.; Cetindas, A.; Akben, I.; Damrah, S. Do Natural Resource Dependence, Economic Growth and Transport Energy Consumption Accelerate Ecological Footprint in the Most Innovative Countries? The Moderating Role of Technological Innovation. Gondwana Res. 2024, 127, 116–130. [Google Scholar] [CrossRef]
Chu, E.; Karr, J. Environmental Impact: Concept, Consequences, Measurement; Elsevier: Amsterdam, The Netherlands, 2016. [Google Scholar] [CrossRef]
Shayo, G.M.; Elimbinzi, E.; Shao, G.N.; Fabian, C. Severity of Waterborne Diseases in Developing Countries and the Effectiveness of Ceramic Filters for Improving Water Quality. Bull. Natl. Res. Cent. 2023, 47, 113. [Google Scholar] [CrossRef]
Temesgen, G.; Lelago, A.; Assefa, E.; Admasie, A. Evaluation of Chlorination Efficiency on Improving Microbiological and Physicochemical Parameters in Water Samples Available in Sheble Berenta District Amhara Region, Ethiopia. Appl. Water Sci. 2023, 13, 120. [Google Scholar] [CrossRef]
Mishra, P.; Jena, D.; Giri, N.C.; Thakur, R.R.; Dash, D.N. Land-Surface Temperature Dynamics in the Fringes of North Bhubaneswar, India: An Empirical Analysis. Curr. Sci. 2024, 127, 222. [Google Scholar] [CrossRef]
Panda, S.; Parida, C.; Azharunnisa, A.; Thakur, R.R. Integrating Nature-Based Solutions in the Urbanization Process by Urban Agriculture: A Case of Bhubaneswar City, India. Discov. Sustain. 2024, 5, 286. [Google Scholar] [CrossRef]
Prusty, P.; Farooq, S. Seawater Intrusion in the Coastal Aquifers of India-A Review. Hydro Res. 2020, 3, 61–74. [Google Scholar] [CrossRef]
Lin, L.; Yang, H.; Xu, X. Effects of Water Pollution on Human Health and Disease Heterogeneity: A Review. Front. Environ. Sci. 2022, 10, 880246. [Google Scholar] [CrossRef]
Singh, N.; Poonia, T.; Siwal, S.S.; Srivastav, A.L.; Sharma, H.K.; Mittal, S.K. Chapter 9—Challenges of Water Contamination in Urban Areas. In Current Directions in Water Scarcity Research; Urban Water Crisis and Management–Strategies for Sustainable Development; Elsevier: Amsterdam, The Netherlands, 2022; Volume 6, pp. 173–202. [Google Scholar] [CrossRef]
Chapman, D.V.; Sullivan, T. The Role of Water Quality Monitoring in the Sustainable Use of Ambient Waters. One Earth 2022, 5, 132–137. [Google Scholar] [CrossRef]
Benam-Beltoungou, E.Y.T.; Bassene, I.; Emvoutou, H.C.; Akpataku, K.V.; Diongue, D.M.L.; Faye, S. Groundwater Quality Assessed Using Water Quality Indices and Geostatistical Methods in the Thiaroye Aquifer, Senegal. Water Sci. 2025, 39, 151–170. [Google Scholar] [CrossRef]
Luo, H.; Nong, X.; Xia, H.; Liu, H.; Zhong, L.; Feng, Y.; Zhou, W.; Lu, Y. Integrating Water Quality Index (WQI) and Multivariate Statistics for Regional Surface Water Quality Evaluation: Key Parameter Identification and Human Health Risk Assessment. Water 2024, 16, 3412. [Google Scholar] [CrossRef]
Raphela, T.; Manqele, N.; Erasmus, M. The Impact of Improper Waste Disposal on Human Health and the Environment: A Case of Umgungundlovu District in KwaZulu Natal Province, South Africa. Front. Sustain. 2024, 5, 1386047. [Google Scholar] [CrossRef]
Singh, B.J.; Chakraborty, A.; Sehgal, R. A Systematic Review of Industrial Wastewater Management: Evaluating Challenges and Enablers. J. Environ. Manag. 2023, 348, 119230. [Google Scholar] [CrossRef]
George-Williams, H.E.M.; Hunt, D.V.L.; Rogers, C.D.F. Sustainable Water Infrastructure: Visions and Options for Sub-Saharan Africa. Sustainability 2024, 16, 1592. [Google Scholar] [CrossRef]
Chidiac, S.; El Najjar, P.; Ouaini, N.; El Rayess, Y.; El Azzi, D. A Comprehensive Review of Water Quality Indices (WQIs): History, Models, Attempts and Perspectives. Rev. Environ. Sci. Bio/Technol. 2023, 22, 349–395. [Google Scholar] [CrossRef] [PubMed]
Kumar, D.; Kumar, R.; Sharma, M.; Awasthi, A.; Kumar, M. Global Water Quality Indices: Development, Implications, and Limitations. Total Environ. Adv. 2024, 9, 200095. [Google Scholar] [CrossRef]
Akhtar, N.; Ishak, M.I.S.; Ahmad, M.I.; Umar, K.; Yusuff, M.S.M.; Anees, M.T.; Qadir, A.; Almanasir, Y.K.A. Modification of the Water Quality Index (WQI) Process for Simple Calculation Using the Multi-Criteria Decision-Making (MCDM) Method: A Review. Water 2021, 13, 905. [Google Scholar] [CrossRef]
Lohani, N. Drinking water quality management through various physico-chemical parameters and health hazard problems with their remedial measures in Bhubaneswar city of Odisha, India. Int. J. Environ. Sci. 2012, 2, 1192–1210. [Google Scholar] [CrossRef][Green Version]
Hossain, M.N.; Howladar, M.F.; Ahammed, S.; Haque, M.R.; Khan, M.I.; Hasan, M.; Chowdhury, T.R.; Hosain, A. Application of multi-indexing approach within a GIS framework to investigate the quality and contamination of ground water in Barisal sadar, Bangladesh. Heliyon 2025, 11, e42262. [Google Scholar] [CrossRef]
Oseke, F.I.; Anornu, G.K.; Adjei, K.A.; Eduvie, M.O. Assessment of water quality using GIS techniques and water quality index in reservoirs affected by water diversion. Water Energy Nexus 2021, 4, 25–34. [Google Scholar] [CrossRef]
Siddiqua, A.; Hahladakis, J.N.; Al-Attiya, W.A. An Overview of the Environmental Pollution and Health Effects Associated with Waste Landfilling and Open Dumping. Environ. Sci. Pollut. Res. Int. 2022, 29, 58514–58536. [Google Scholar] [CrossRef]
Nawaz, R.; Nasim, I.; Irfan, A.; Islam, A.; Naeem, A.; Ghani, N.; Irshad, M.A.; Latif, M.; Nisa, B.U.; Ullah, R. Water Quality Index and Human Health Risk Assessment of Drinking Water in Selected Urban Areas of a Mega City. Toxics 2023, 11, 577. [Google Scholar] [CrossRef]
Masood, A.; Aslam, M.; Pham, Q.B.; Khan, W.; Masood, S. Integrating Water Quality Index, GIS and Multivariate Statistical Techniques towards a Better Understanding of Drinking Water Quality. Environ. Sci. Pollut. Res. Int. 2022, 29, 26860–26876. [Google Scholar] [CrossRef]
Unto, P.B. The Effect of Water Loss on Demand–Supply Departure of Addis Ababa, Ethiopia. Discov. Water 2024, 8, 89. [Google Scholar] [CrossRef]
Cosgrove, W.J.; Loucks, D.P. Water Management: Current and Future Challenges and Research Directions. Water Resour. Res. 2015, 51, 4823–4839. [Google Scholar] [CrossRef]
Wudil, A.H.; Ali, A.; Usman, M.; Radulescu, M.; Sass, R.; Prus, P.; Musa, S. Effects of Inequality of Access to Irrigation and Water Productivity on Paddy Yield in Nigeria. Agronomy 2023, 13, 2195. [Google Scholar] [CrossRef]
Kuchimanchi, B.R.; Ripoll-Bosch, R.; Steenstra, F.A.; Thomas, R.; Oosting, S.J. The Impact of Intensive Farming Systems on Groundwater Availability in Dryland Environments: A Watershed Level Study from Telangana, India. Curr. Res. Environ. Sustain. 2023, 5, 100198. [Google Scholar] [CrossRef]
Nti, E.K.; Cobbina, S.J.; Attafuah, E.E.; Senanu, L.D.; Amenyeku, G.; Gyan, M.A.; Forson, D.; Safo, A. Water Pollution Control and Revitalization Using Advanced Technologies: Uncovering Artificial Intelligence Options towards Environmental Health Protection, Sustainability and Water Security. Heliyon 2023, 9, e18170. [Google Scholar] [CrossRef]
Pradhan, B.K.; Yadav, S.; Ghosh, J.; Prashad, A. Achieving the Sustainable Development Goals (SDGs) in the Indian State of Odisha: Challenges and Opportunities. World Dev. Sustain. 2023, 3, 100078. [Google Scholar] [CrossRef]
United Nations. Sustainable Development Goals. United Nations. 2024. Available online: https://www.un.org/sustainabledevelopment/sustainable-development-goals/ (accessed on 15 January 2025).
Kayitesi, N.M.; Guzha, A.C.; Mariethoz, G. Impacts of Land Use Land Cover Change and Climate Change on River Hydro-morphology-a Review of Research Studies in Tropical Regions. J. Hydrol. 2022, 615, 128702. [Google Scholar] [CrossRef]
Naha, S.; Rico-Ramirez, M.A.; Rosolem, R. Quantifying the Impacts of Land Cover Change on Hydrological Responses in the Mahanadi River Basin in India. Hydrol. Earth Syst. Sci. 2021, 25, 6339–6357. [Google Scholar] [CrossRef]
Sharma, P.; Bora, P.J. Water quality assessment using water quality index and principal component analysis: A case study of historically important lakes of Guwahati City, North-East India. Appl. Ecol. Environ. Sci. 2020, 8, 207–217. [Google Scholar] [CrossRef]
Dandapat, A.K.; Panda, P.K.; Sankalp, S. A Topographical Investigation of Twenty-Nine Geomorphological Parameters over Kuladara Watershed, Odisha. AIP Conf. Proceeding 2023, 2831, 050011. [Google Scholar] [CrossRef]
Nayak, P.; Mohanty, A.K.; Samal, P.; Khaoash, S.; Mishra, P. Groundwater Quality, Hydrogeochemical Characteristics, and Potential Health Risk Assessment in the Bhubaneswar City of Eastern India. Water Air Soil Pollut. 2023, 234, 609. [Google Scholar] [CrossRef]
Office of the Registrar General & Census Commissioner, India. District Census Handbook: Khordha, Odisha. Series 22, Part XII-A & B, Census of India 2011; Directorate of Census Operations: Odisha, India, 2011.
Malik, S.; Pal, S.C. Is the Topography Playing a Dual Role in Controlling Downstream Channel Morphology of a Monsoon Dominated Dwarkeswar River, Eastern India? Hydro Res. 2020, 3, 15–31. [Google Scholar] [CrossRef]
Tan, S.Y.; Taeihagh, A. Smart City Governance in Developing Countries: A Systematic Literature Review. Sustainability 2020, 12, 899. [Google Scholar] [CrossRef]
Al Qahtani, T.; Yaakob, M.S.; Yidris, N.; Sulaiman, S.; Ahmad, K.A. A review on water leakage detection method in the water distribution network. J. Adv. Res. Fluid Mech. Therm. Sci. 2020, 68, 152–163. [Google Scholar] [CrossRef]
Shaikh, M.M.; MohdHanafiah, M.; Basheer, A.O. Leaching of organic toxic compounds from PVC water pipes in Medina Al-Munawarah, Kingdom of Saudi Arabia. Processes 2019, 7, 641. [Google Scholar] [CrossRef]
Vargas, I.T.; Pastén, P.A.; Pizarro, G.E. Empirical model for dissolved oxygen depletion during corrosion of drinking water copper pipes. Corros. Sci. 2010, 52, 2250–2257. [Google Scholar] [CrossRef]
Li, M.; Liu, Z.; Chen, Y.; Hai, Y. Characteristics of iron corrosion scales and water quality variations in drinking water distribution systems of different pipe materials. Water Res. 2016, 106, 593–603. [Google Scholar] [CrossRef]
American Public Health Association; American Water Works Association; Water Environment Federation. Standard Methods for the Examination of Water and Wastewater, 22nd ed.; American Public Health Association: Washington, DC, USA; American Water Works Association: Washington, DC, USA; Water Environment Federation: Alexandria, VA, USA, 2012. [Google Scholar]
An, L.; Zhang, Z.; Feng, J.; Lv, F.; Li, Y.; Wang, R.; Lu, M.; Gupta, R.B.; Xi, P.; Zhang, S. Heterostructure-Promoted Oxygen Electrocatalysis Enables Rechargeable Zinc–Air Battery with Neutral Aqueous Electrolyte. J. Am. Chem. Soc. 2018, 140, 17624–17631. [Google Scholar] [CrossRef]
Uddin, M.G.; Nash, S.; Olbert, A.I. A review of water quality index models and their use for assessing surface water quality. Ecol. Indic. 2020, 122, 107218. [Google Scholar] [CrossRef]
Atlas Scientific. Dissolved Oxygen in Drinking Water, 21 March 2022. Available online: https://atlas-scientific.com/blog/dissolved-oxygen-in-drinking-water/ (accessed on 20 January 2025).
Mishra, P.; Nandi, D.; Sahu, P.; Mohanta, K.; Edinur, H.; Sarkar, T.; Pati, S. Hydro-Geochemical Attributes Based Classifiers for Groundwater Analysis. Ecol. Eng. Environ. Technol. 2021, 22, 28–39. [Google Scholar] [CrossRef]
Juárez, J.E.R.; Alvarado Ma, A.; Zamarron, A.S.; González, O.A.; Hernandez, V.H.B.; Trujillo, E.O.; De Alba, Á.A.V. Chemical conditioning of drinking groundwater through Ca²⁺/Mg²⁺ ratio adjust as a treatment to reduce Ca precipitation: Batch assays and test bench experiments. J. Water Process Eng. 2023, 53, 103844. [Google Scholar] [CrossRef]
Elwood, J.M.; van der Werf, B. Nitrates in drinking water and cancers of the colon and rectum: A meta-analysis of epidemiological studies. Cancer Epidemiol. 2022, 78, 102148. [Google Scholar] [CrossRef] [PubMed]
Habuda-Stanić, M.; Ravančić, M.; Flanagan, A. A Review on Adsorption of Fluoride from Aqueous Solution. Materials 2014, 7, 6317–6366. [Google Scholar] [CrossRef] [PubMed]
Dewangan, S.K.; Shrivastava, S.K.; Tigga, V.; Lakra, M.; Namrata, P. Review paper on the role of pH in water quality implications for aquatic life, human health, and environmental sustainability. Int. Adv. Res. J. Sci. Eng. Technol. 2023, 10, 215–218. Available online: https://www.researchgate.net/publication/371539436_REVIEW_PAPER_ON_THE_ROLE_OF_PH_INWATER_QUALITY_IMPLICATIONS_FORAQUATIC_LIFE_HUMAN_HEALTH_ANDENVIRONMENTAL_SUSTAINABILITY (accessed on 20 January 2025).
Daniel, M.H.B.; Montebelo, A.A.; Bernardes, M.C.; Ometto, J.P.B.; de Camargo, P.B.; Krusche, A.V.; Balleater, M.V.; Victoria, R.L.; Martinelli, L.A. Effects of urban sewage on dissolved oxygen, dissolved inorganic and organic carbon, and electrical conductivity of small streams along a gradient of urbanization in the Piracicaba river basin. Water Air Soil Pollut. 2002, 136, 189–206. [Google Scholar] [CrossRef]
Barad, S.; Mishra, P.; Sahu, P.C.; Sarkar, T.; Amin, M.F.M.; Choudhury, T.; Edinur, H.A.; Kari, Z.A.; Nandi, D.; Pati, S. Comparative Approach of Decision Tree and CWQI Analysis for Classification of Groundwater with a Special Reference to Fluoride Ion in Drought-Prone Boudh District of Odisha, India. Sustain. Water Resour. Manag. 2021, 7, 94. [Google Scholar] [CrossRef]
Wagh, V.M.; Panaskar, D.B.; Jacobs, J.A.; Mukate, S.V.; Muley, A.A.; Kadam, A.K. Influence of hydro-geochemical processes on groundwater quality through geostatistical techniques in Kadava River basin, Western India. Arab. J. Geosci. 2018, 12, 7. [Google Scholar] [CrossRef]
Mukherjee, A.; Saha, D.; Harvey, C.F.; Taylor, R.G.; Ahmed, K.M.; Bhanja, S.N. Groundwater systems of the Indian Sub-Continent. J. Hydrol. Reg. Stud. 2015, 4, 1–14. [Google Scholar] [CrossRef]
Jodhani, K.H.; Gupta, N.; Dadia, S.; Patel, H.; Patel, D.; Jamjareegulgarn, P.; Singh, S.K.; Rathnayake, U. Sustainable groundwater management through water quality index and geochemical insights in Valsad India. Sci. Rep. 2025, 15, 8769. [Google Scholar] [CrossRef]
Rubenowitz-Lundin, E.; Hiscock, K.M. Water Hardness and Health Effects; Springer: Berlin/Heidelberg, Germany, 2012; pp. 337–350. [Google Scholar] [CrossRef]
Samal, K.P.; Pradhan, A.K.; Tarai, A. Assessment of Seasonal Variation of Water Quality in Bhubaneswar Urban Catchment Using Water Quality Index Method. In World Anthropology Congress; Atlantis Press: Dordrecht, The Netherlands, 2023; pp. 76–94. [Google Scholar] [CrossRef]
Pati, A.K.; Tripathy, A.R.; Nandi, D.; Thakur, R.R.; Pandey, M. Irrigation Water Quality Prognostication: An innovative ensemble architecture leveraging deep learning and machine learning for enhanced SAR and ESP estimation in the east coast of India. J. Environ. Chem. Eng. 2025, 13, 116433. [Google Scholar] [CrossRef]
Azoulay, A.; Garzon, P.; Eisenberg, M.J. Comparison of the mineral content of tap water and bottled waters. J. Gen. Intern. Med. 2001, 16, 168–175. [Google Scholar] [CrossRef] [PubMed]
Garcia, R.J.L.; da Silva Júnior, J.B.; Abreu, I.M.; Soares, S.A.R.; Araujo, R.G.O.; de Souza, E.S.; Ribeiro, H.J.S.; Hadlich, G.M.; de Souza Queiroz, A.F. Application of PCA and HCA in Geochemical Parameters to Distinguish Depositional Paleoenvironments from Source Rocks. J. S. Am. Earth Sci. 2020, 103, 102734. [Google Scholar] [CrossRef]
Gupta, M.; Biswas, R.; Kumar, A.; Tortajada, C. Consideration of water policies in the urban development plans of Delhi: A collaborative planning perspective. River 2024, 3, 228–244. [Google Scholar] [CrossRef]
Mohammadpour, A.; Gharehchahi, E.; Golaki, M.; Gharaghani, M.A.; Ahmadian, F.; Abolfathi, S.; Samaei, M.R.; Uddin, M.G.; Olbert, A.I.; Khaneghah, A.M. Advanced water quality assessment using machine learning: Source identification and probabilistic health risk analysis. Results Eng. 2025, 27, 105421. [Google Scholar] [CrossRef]
Sonar, C.; Hammadi, A.M.A.; Padme, Y.L. Water quality assessment using principal component analysis. In Communications in Computer and Information Science; Springer: Cham, Switzerland, 2024; pp. 88–97. [Google Scholar] [CrossRef]
Das, A. An Optimized Approach for Predicting Water Quality Features and A Performance evaluation for Mapping Surface Water Potential Zones Based on Discriminant Analysis (DA), Geographical Information System (GIS) and Machine Learning (ML) Models in Baitarani River Basin, Odisha. Desalin. Water Treat. 2025, 321, 101039. [Google Scholar] [CrossRef]
Obaideen, K.; Shehata, N.; Sayed, E.T.; Abdelkareem, M.A.; Mahmoud, M.S.; Olabi, A. The role of wastewater treatment in achieving sustainable development goals (SDGs) and sustainability guideline. Energy Nexus 2022, 7, 100112. [Google Scholar] [CrossRef]
Rammohan, B.; Partheeban, P.; Ranganathan, R.; Balaraman, S. Groundwater quality prediction and analysis using machine learning models and geospatial technology. Sustainability 2024, 16, 9848. [Google Scholar] [CrossRef]
Ajith, V.; Fishman, R.; Yosef, E.; Edris, S.; Ramesh, R.; Suresh, R.A.; Pras, A.; Rahim, V.; Rajendran, S.; Yanko, M.; et al. An integrated methodology for assessment of drinking-water quality in low-income settings. Environ. Dev. 2023, 46, 100862. [Google Scholar] [CrossRef]
Prapanchan, V.; Subramani, T.; Karunanidhi, D.; Gopinathan, P. Groundwater quality assessment for drinking and irrigation purposes and its human health risks in the Sevathur mine region, south India. Desalin. Water Treat. 2024, 320, 100883. [Google Scholar] [CrossRef]
Giao, N.T.; Nhien, H.T.H.; Anh, P.K.; Thuptimdang, P. Groundwater quality assessment for drinking purposes: A case study in the Mekong Delta, Vietnam. Sci. Rep. 2023, 13, 4380. [Google Scholar] [CrossRef]
Lukhabi, D.K.; Mensah, P.K.; Asare, N.K.; Pulumuka-Kamanga, T.; Ouma, K.O. Adapted Water Quality Indices: Limitations and potential for water quality monitoring in Africa. Water 2023, 15, 1736. [Google Scholar] [CrossRef]
Silva, J.A. Wastewater Treatment and Reuse for Sustainable Water Resources Management: A Systematic Literature review. Sustainability 2023, 15, 10940. [Google Scholar] [CrossRef]
Yamin, D.; Uskoković, V.; Wakil, A.; Goni, M.; Shamsuddin, S.; Mustafa, F.; Alfouzan, W.; Alissa, M.; Alshengeti, A.; Almaghrabi, R.; et al. Current and future technologies for the detection of Antibiotic-Resistant Bacteria. Diagnostics 2023, 13, 3246. [Google Scholar] [CrossRef] [PubMed]
Gao, Z.; Chen, J.; Wang, G.; Ren, S.; Fang, L.; Yinglan, A.; Wang, Q. A novel multivariate time series prediction of crucial water quality parameters with Long Short-Term Memory (LSTM) networks. J. Contam. Hydrol. 2023, 259, 104262. [Google Scholar] [CrossRef]
Uddin, M.G.; Rahman, A.; Taghikhah, F.R.; Olbert, A.I. Data-driven evolution of water quality models: An in-depth investigation of innovative outlier detection approaches-A case study of Irish Water Quality Index (IEWQI) model. Water Res. 2024, 255, 121499. [Google Scholar] [CrossRef]
Alkahtani, M.; Mallick, J.; Alqadhi, S.; Sarif, M.N.; Ahmed, M.F.M.; Abdo, H.G. Interpretation of Bayesian-optimized deep learning models for enhancing soil erosion susceptibility prediction and management: A case study of Eastern India. Geocarto Int. 2024, 39, 2367611. [Google Scholar] [CrossRef]
Lokman, A.; Ismail, W.Z.W.; Aziz, N.A.A. A review of water quality forecasting and classification using machine learning models and statistical analysis. Water 2025, 17, 2243. [Google Scholar] [CrossRef]

Figure 1. (a) India (country), (b) Odisha (state), (c) Khordha (district), (d) BMC, (e) Location map (study area).

Figure 2. Methodology flow chart.

Figure 3. Box-whisker plot represents comparisons for (A) pH, (B) DO, (C) EC, (D) Alkalinity, (E) Hardness, (F) Chloride, (G) TDS, (H) BOD.

Figure 4. Maps of vulnerable areas and deviations for parameters (a) pH, (b) DO, (c) EC, and (d) Hardness WQI.

Figure 5. Water Quality Index of different wards, with reference lines at WQI = 25 and WQI = 50, indicating thresholds for quality classification.

Figure 6. Principal Component Number.

Figure 7. The Scree plot.

Table 1. Standard of Water Quality Index.

WQI Level	Status
0 to 25	Outstanding
26 to 50	Good
51 to 75	Poor
76 to 100	Very Poor
>100	Unhealthy for Drinking

Table 2. Water Quality Index calculation of Ward No. 1 of the north zone of Bhubaneswar.

Sl No.	Parameters (As Per WHO)	s_i	1/s_i	Unit Weight (w_i = k/s_i)	Original Values	q_i	w_iq_i
1	pH	8.5	0.1	0.2	6.9	−5.3	−1.2
2	DO	5	0.2	0.4	7.5	0.7	0.3
3	Conductivity	300	0.003	0.01	129.1	43.0	0.3
4	Alkalinity	200	0.01	0.01	53	26.5	0.3
5	Hardness	200	0.01	0.01	52.5	26.2	0.2
6	Chloride	250	0.004	0.01	37.4	15.0	0.1
7	TDS	500	0.002	0.004	83.9	16.8	0.1
8	BOD	5	0.2	0.4	2.4	48.7	18.1
			0.5	0.99 ≃ 1.0			18.2
		K = 1.9				WQI = 18.2

Table 3. Ward-wise WQI.

Ward No.	WQI	Ward No	WQI
1	18.2	12	34.1
2	20.4	13	32.3
3	31.0	14	28.7
4	32.7	16	25.0
5	48.7	17	11.1
6	31.6	18	37.9
7	31.2	19	32.8
8	45.2	20	22.8
9	26.5	21	26.1
10	36.9	26	31.2
11	35.6

Table 4. Correlation coefficient matrix.

	pH	DO	Conductivity	Alkalinity	Hardness	Chloride	TDS	BOD
pH	1
DO	0.2	1
Conductivity	0.6	0.3	1
Alkalinity	0.6	−0.2	0.3	1
Hardness	0.7	0.5	0.8	0.3	1
Chloride	−0.3	−0.2	−0.1	−0.03	−0.3	1
TDS	0.6	0.3	1	0.3	0.8	−0.1	1
BOD	0.4	0.3	0.2	0.3	0.4	−0.3	0.2	1

Table 5. Loads of 8 variables in two principal components.

Parameters	Eigenvalue	Percentage of Variance	Coefficients of PC1	Coefficients of PC2
pH	3.8	48.0%	0.4	0.2
DO	1.4	17.0%	0.2	−0.7
Conductivity	1.2	14.7%	0.5	0.04
Alkalinity	0.8	9.5%	0.3	0.6
Hardness	0.5	5.8%	0.5	−0.1
Chloride	0.3	3.3%	−0.2	0.3
TDS	0.1	1.7%	0.5	0.04
BOD	0	0.0%	0.2	−0.1

Table 6. Performance of the final model.

	Accuracy	Precision	F1 Score
LR	0.9	0.8	0.8
DT	0.9	0.9	0.9
RF	0.9	0.9	0.9
SVM	0.9	0.9	0.9
KNN	0.9	0.9	0.9
NB	0.9	0.9	0.9

Table 7. Hyperparameter settings and Function selection.

Model	Hyperparameters
Logistic Regression–Scikit-learn Logistic Regression	penalty = ‘l2’ C = 1.0, solver = ‘lbfgs’ max_iter = 100 multi_class = ‘auto’ tol = 1 × 10⁻⁴
Decision Tree–Scikit-learn DecisionTreeClassifier	criterion = ‘gini’ splitter = ‘best’ max_depth = None min_samples_split = 2 min_samples_leaf = 1 max_features = None
Random Forest–Scikit-learn RandomForestClassifier	n_estimators = 100 criterion = ‘gini’ max_depth = None min_samples_split = 2 min_samples_leaf = 1 max_features = ‘auto’ bootstrap = True
Support Vector Machine–Scikit-learn SVC	C = 1.0, kernel = ‘rbf’ degree = 3 gamma = ‘scale’ coef0 = 0.0, probability = False tol = 1 × 10⁻³
K-Nearest Neighbors–Scikit-learn KNeighborsClassifier	n_neighbors = 5 weights = ‘uniform’ algorithm = ‘auto’ leaf_size = 30 p = 2 metric = ‘minkowski’
Gaussian Naïve Bayes–Scikit-learn GaussianNB	priors = None var_smoothing = 1 × 10⁻⁹

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Samal, K.P.; Thakur, R.R.; Panda, A.K.; Nandi, D.; Pati, A.K.; Pegu, K.; Đurin, B. Machine Learning-Enhanced Monitoring and Assessment of Urban Drinking Water Quality in North Bhubaneswar, Odisha, India. Limnol. Rev. 2025, 25, 44. https://doi.org/10.3390/limnolrev25030044

AMA Style

Samal KP, Thakur RR, Panda AK, Nandi D, Pati AK, Pegu K, Đurin B. Machine Learning-Enhanced Monitoring and Assessment of Urban Drinking Water Quality in North Bhubaneswar, Odisha, India. Limnological Review. 2025; 25(3):44. https://doi.org/10.3390/limnolrev25030044

Chicago/Turabian Style

Samal, Kshyana Prava, Rakesh Ranjan Thakur, Alok Kumar Panda, Debabrata Nandi, Alok Kumar Pati, Kumarjeeb Pegu, and Bojan Đurin. 2025. "Machine Learning-Enhanced Monitoring and Assessment of Urban Drinking Water Quality in North Bhubaneswar, Odisha, India" Limnological Review 25, no. 3: 44. https://doi.org/10.3390/limnolrev25030044

APA Style

Samal, K. P., Thakur, R. R., Panda, A. K., Nandi, D., Pati, A. K., Pegu, K., & Đurin, B. (2025). Machine Learning-Enhanced Monitoring and Assessment of Urban Drinking Water Quality in North Bhubaneswar, Odisha, India. Limnological Review, 25(3), 44. https://doi.org/10.3390/limnolrev25030044

	Accuracy	Precision	F1 Score
LR	0.9	0.8	0.8
DT	0.9	0.9	0.9
RF	0.9	0.9	0.9
SVM	0.9	0.9	0.9
KNN	0.9	0.9	0.9
NB	0.9	0.9	0.9

	Accuracy	Precision	F1 Score
LR	0.9	0.8	0.8
DT	0.9	0.9	0.9
RF	0.9	0.9	0.9
SVM	0.9	0.9	0.9
KNN	0.9	0.9	0.9
NB	0.9	0.9	0.9

Article Menu

Machine Learning-Enhanced Monitoring and Assessment of Urban Drinking Water Quality in North Bhubaneswar, Odisha, India

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Quality Parameters

2.3. Calculation of WQI

2.4. Machine Learning Models

2.4.1. Logistic Regression (LR)

2.4.2. Decision Tree (DT)

2.4.3. Random Forest

2.4.4. Support Vector Machine (SVM)

2.4.5. K-Nearest Neighbors (KNN)

2.4.6. Distance Metric (Euclidean Distance)

2.4.7. Naive Bayes (NB)

3. Results

3.1. Illustration of Water Quality

3.1.1. pH

3.1.2. DO

3.1.3. EC

3.1.4. Alkalinity

3.1.5. Hardness

3.1.6. Chloride

3.1.7. TDSs

3.1.8. Biochemical Oxygen Demand

3.1.9. Identification of Contaminated Areas

3.2. Graphical Presentation of Water Quality Index

3.3. Principal Component Analysis

Hyperparameter Tuning and Function Selection

4. Discussion

4.1. Comparison and Implications

4.2. Study Limitations

4.3. Recommendation and Suggestion

4.4. Conclusion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

	Accuracy	Precision	F1 Score
LR	0.9	0.8	0.8
DT	0.9	0.9	0.9
RF	0.9	0.9	0.9
SVM	0.9	0.9	0.9
KNN	0.9	0.9	0.9
NB	0.9	0.9	0.9