1. Introduction
Water scarcity and inadequate sanitation hinder economic growth and public health, affecting people globally who lack access to clean drinking water and are vulnerable to waterborne diseases, particularly in developing nations [
1,
2]. The population exceeds two billion without access to clean drinking water, while the spread of waterborne diseases leads to high disease rates in developing nations [
3,
4]. The rapid industrial growth and urban development in North Bhubaneswar have worsened water contamination, as resources from the domestic sectors, agriculture, and industries have entered the environment [
5,
6,
7]. Water pollution, which affects one and a half million people throughout the city, exists because the river and water resources are contaminated [
8,
9]. Accomplishing a water quality assessment requires the development of an organized implementation strategy that addresses existing challenges in implementation [
10]. Combining the Water Quality Index (WQI) with geo-statistical modeling and contemporary assessment systems enables proper observation and enhancement monitoring of water quality [
11,
12]. Poor management of sanitation practices and the release of pollutants lead to severe environmental damage, endangering public health [
13]. Severe water resource destruction occurs in the city due to industrial contamination, which combines with household sewage spills, while inadequate water purification facilities and agricultural runoff create additional problems [
14]. Water sustainability and proper infrastructure are crucial for establishing reliable drinking water systems and ensuring environmental safety within the city [
15].
Multiple researchers have used the Weighted Arithmetic Water Quality Index (WAWQI) method to study urban water quality and surface waters, identifying contaminated areas as they developed protection strategies for delivering safe water resources [
16,
17,
18]. Scientists observe a growing emergency due to population growth, declining water quality, and shrinking water resources. Pipe water is contaminated with increasing levels of chemical and organic pollutants, posing a significant risk to human health. According to the availability of surface water, contaminated water must be treated before being supplied to the distribution network or pipeline system [
19].
This study utilized Geographic Information Systems (GIS) to analyze the spatial distribution of physicochemical parameters and identify patterns in water quality across the study area. By integrating GIS with hydro-geochemical characteristics and statistical analysis, we aim to develop a robust framework for mapping and monitoring water quality assessments [
20].
Hydro-geochemical characteristics, when combined with statistical analysis, form a practical framework for mapping and monitoring water quality [
21]. Research today primarily focuses on the quality of piped water in central Bhubaneswar; however, studies on water quality in the urban supply are scarce for the northern regions of the city. This comprehensive assessment provides a deeper understanding of water issues throughout the city and identifies essential requirements for implementing sustainable management systems. In this context, the present study aims to assess the quality of treated drinking water supplied through the municipal distribution network (tap water) across 21 wards of North Bhubaneswar. The purpose is to identify spatial variations and potential contamination zones in the supply water using a combination of the WAWQI, Principal Component Analysis (PCA), and machine learning classification models. The findings aim to support policymakers and urban planners in implementing targeted improvements to water supply infrastructure and public health safety.
Scientists have noted the increasing risk of water pollution resulting from industrial growth and inadequate waste disposal systems [
22]. The accelerating population growth and increasing industrial activities in North Bhubaneswar have resulted in severe water pollution problems, which demand advanced water quality evaluation methods. Present water quality assessment systems fail to track pollution patterns across different testing areas, neglect the effects of multiple pollutants on one another, and thus require updated assessment methods [
23]. This research develops a high-end water quality assessment system that integrates WQI analysis techniques with geographic statistical methods. The study employs a WQI-based geostatistical framework to investigate temporal and spatial variations in water quality and identify key factors affecting water quality, thereby supporting informed decision-making [
24]. The research revolutionizes water quality assessment by achieving these objectives, thereby better tracking environmental conditions while simultaneously lowering public risks and establishing sustainable community development in North Bhubaneswar.
According to the research findings, approximately 1.5 million people from this population will benefit from applying the information to policy creation, while developing enhanced water management strategies and advancing future investigations.
Population growth has created extreme surface water demands in recent decades [
25]. Surface water resources are under severe stress due to the combined effects of urban growth and pollution [
26]. Irrigation resources are diminishing, and water conditions are worsening due to population growth and the concurrent impact on industrial and agricultural development, as well as dairy farming sectors and industrial production [
27,
28]. New economic development patterns have a profoundly destructive effect on water quality, making it progressively more challenging to maintain proper water supply operations [
29]. The rapidly growing population of Odisha, like that of other states, has made the provision of human drinking water a priority as health awareness continues to expand [
30]. The expected population expansion necessitates the improvement of water distribution systems to meet future human water needs. Consuming contaminated water through drinking or contact with the substance results in various serious health problems, including cholera, hepatitis A, dysentery, and diarrhea.
Water distribution network points underwent an evaluation through the assessment of pH levels, chloride and hardness measurements, followed by oxygen content measurements. At the same time, tests analyzed alkalinity, electrical conductivity (EC), total dissolved solids, and biochemical oxygen demand exams [
31]. All water specimen testing was conducted at the Environmental Engineering Laboratory, KIIT DU. The PCA data simplification method helped maintain essential data elements because its parameter correlation checks functioned adequately. The findings aligned with the outcomes of earlier research that employed PCA to identify key components in water quality analysis [
32]. Scientists choose PCA as their decision-making tool because this method enables them to develop a standardized approach to organize water dataset variables. The leading causes of water quality depletion stem from aged pipeline materials and industrial waste generated by residential zones, institutional buildings, and landfills. Through ArcGIS software, meaningful maps were generated that demonstrated the areas where water quality monitoring requires attention in each ward of North Bhubaneswar. This spatial analytical tool facilitates the simple identification of priority areas needed by policymakers. This research analysis demonstrates varying ward positions using WQI data and results from PCA analysis. Achieving universal access to sustainable drinking water requires continuous monitoring methods through modern infrastructure, alongside dedicated plans to meet Goal 6 of the Sustainable Development Goals [
33].
This research pioneers an integrated approach to water quality assessment in North Bhubaneswar, leveraging WQI analysis and geospatial statistical methods to identify key factors affecting water quality. By harnessing GIS and PCA, spatial maps of water quality parameters were generated, which will aid in making policy decisions for sustainable water management.
Some researchers in India have examined land use changes and hydrological impacts while, on a global scale, studies have been conducted to analyze the effects of climate variability on water resources [
34]. This research advances the field by combining geospatial analysis with machine learning to investigate complex interactions among land use, climate, and hydrology in a specific watershed. This study offers nuanced insights into system dynamics, enabling the development of more effective water resource management strategies tailored to India’s regional needs.
2. Materials and Methods
2.1. Study Area
The study area is situated in the northern section of the Bhubaneswar Municipality Corporation (BMC) administrative unit of Khordha District, Odisha, between latitudes 20°16′ and 20°24′ N and longitudes 85°45′ and 85°54′ E, with an area of approximately 255 km
2, as shown in
Figure 1. The terrain is undulating with an average elevation of 46 m above sea level [
35]. The rivers that make up the river system are Kuakhai in the east and Daya in the south. Streams like JhumkaNala and GanguaNala supply water to the Daya River on the southern side of the city. It has various freshwater lakes, including historic Bindusagar and Vanivihar Lake [
36]. It experiences a humid tropical climate with distinct seasons: winter, summer, and rainy season, and the summer season (April–May) is very hot and humid, with day temperatures ranging from 31 to 45 °C, whereas the winter season (December–January) is moderately cold, with temperatures fluctuating between 12 and 30 °C. The average annual rainfall in the area has been between 1450 and 1550 mm for the last 10 years. The monsoon season (July–September) accounts for the majority (85%) of the total annual rainfall, with the highest amounts recorded in August. The topography predominantly controls the floodplains and alluvial soils in the southeast, as well as the hilly terrains in the northwest [
37]. The city’s population density is approximately 2131 persons per square kilometer, as per the 2011 Census (Bhubaneswar Municipal Corporation, 2011) [
38]. The Indian government has chosen Bhubaneswar as part of its smart city development strategy [
39]. In
Figure 1, numbered labels indicate the ward numbers of North Bhubaneswar, which are the primary administrative units considered for water sampling in this study. These ward boundaries were overlaid to depict the spatial distribution of the 21 selected wards where water samples were collected. Additionally, the figure highlights several key locations within the study area, including major urban zones such as educational institutions, healthcare centers, commercial hubs, high-density residential areas, and rapidly developing construction sites. Specifically, these include Apollo Hospital, KIIT Deemed to be University, Utkal Hospital, Infocity, Esplanade Mall, and the International Kalinga Stadium, all of which are critical hotspots with significant human activity, infrastructure load, and increased risk of pipeline stress or contamination. These locations were intentionally highlighted because a substantial portion of the water samples were collected around these zones, where risks of pipeline leakage, contamination, and infrastructural stress are relatively higher.
Leak points were identified through a thorough inspection, revealing damages and fissures caused by contractor negligence during installation and pipe connections in the prone wards of the study area. Further, the majority of the leaks were found at coupler, elbow, valve, and saddle clamp connections. Backfilling and bedding sand with significant and sharp stones results in scratching and puncturing of the pipes and fittings [
40]. The fluctuation in dissolved oxygen (DO) is directly proportional to the corrosion rates caused by redox couple reactions within the pipeline system [
41]. DO consumption refers to the use of pipeline materials, such as copper, steel, and iron [
42]. Pipe materials play a critical role in the corrosion and degradation of water quality in water distribution systems [
43]. The infrastructural vulnerability is closely linked to the water quality indicators analysed in this study. Parameters such as DO, EC, and hardness are directly impacted by leaching and contamination resulting from pipeline damage. These considerations are reflected in both the WQI calculation and the multivariate analysis, where such parameters significantly contributed to the observed spatial variation. In light of the above, the study focused on Bhubaneswar’s northern region, which is home to several educational institutions and significant new construction developments. In
Figure 2, the detailed process is described.
2.2. Quality Parameters
This study considers the following water quality parameters: pH, DO, EC, Alkalinity, Hardness, Chloride, Total Dissolved Solids (TDSs), and Biochemical Oxygen Demand (BOD). All the samples were withdrawn from the municipal tap water interfaced with the surface water treatment system of Bhubaneswar Municipal Corporation (BMC). The northern part of the Bhubaneswar city was studied, which includes 21 wards (Ward Nos. 1 to 14, 16 to 21, and 26). The sites for collection were chosen based on the risk of surface and pipe water pollution, which included industrial areas, commercial markets, densely populated regions, construction sites, and schools and universities.
A total of 105 water samples (5 per ward) were collected in 1-L, pre-cleaned polyethylene bottles, stored at 5 °C, and transported to the Environmental Engineering Laboratory at KIIT DU by following the Indian Standard and WHO guidelines [
44]. All the results were compared with the values mentioned in
Table 1. The laboratory practices were conducted as outlined in the Standard Methods for the Examination of Water and Wastewater ensuring accuracy through the use of sterilized glassware and calibrated instruments [
45]. Measurements of pH and DO were performed using a standard multiparameter analyzer. EC and TDSs were assessed using a calibrated conductivity meter. Alkalinity and hardness were determined through acid-base titration and ethylenediaminetetraacetic acid (EDTA) titration, respectively. Chloride concentration was measured using argentometric titration, and BOD was analyzed by incubating the samples at 20 °C for 5 days. The WQI was calculated using the mean values of each parameter of the five samples per ward. The samples were collected during the winter season of 2023–2024 (November–February) to reflect dry-season conditions.
2.3. Calculation of WQI
The WQI was calculated ward-wise using the average values of the parameters for a particular location. The total parameters considered for the WQI computation were eight, in addition to pH, DO, conductivity, alkalinity, hardness, chloride, TDSs, and BOD; lead, copper, and Zn were excluded from the WQI computation since they appear at a concentration level well within the desirable limits for the supply water.
The drinking water quality standard, approved by the BIS in 2012 and the WHO in 2011, has been used to calculate the WQI. The WQI approach is an effective tool that helps inform the public and policymakers about the quality of water [
46]. The WQI is an effective method for assessing and communicating the general status of total water quality, supporting informed choices in managing the resource and health programs [
47]. It is an effective tool that enables the incorporation of water metrics believed to be essential for water quality, as outlined in
Table 1, which presents the World Health Organization (WHO) guideline limits for various water quality parameters used to evaluate the water quality in our study. Since it is considered the most suitable option in the given situation, the WQI, computed using the weighted arithmetic index method, is employed in this paper to evaluate the impact of contaminants on supply water [
48].
The WQI is given in Equation (1):
where
is the quality rating (sub-index) of the i
th water quality parameter, and
is the unit weight of the i
th water quality parameter 1. In addition,
, which relates the value of the parameter in polluted water to the standard permissible value, is obtained as follows:
where
is the estimated value of the i
th parameter,
the ideal value of the i
th parameter, and
is the standard permissible value of the i
th parameter; in most cases,
is 0 except for pH and DO. For pH,
is 7 and, for DO,
is 14.6 mg/L. The unit weight (
), which is inversely proportional to the values of the recommended standards, is obtained, followed by Equations (2) and (3):
2.4. Machine Learning Models
In the present study, supervised machine learning (ML) algorithms were employed to classify water samples based on their WQI category. The models were initially trained using eight physicochemical parameters measured from municipal tap water samples: pH, EC, DO, alkalinity, hardness, chloride, total dissolved solids (TDSs), and biochemical oxygen demand (BOD). These variables were selected for their regulatory relevance and importance in water quality assessment. To further enhance predictive accuracy and model robustness, the final model development incorporated a total of 17 input features grouped into three categories based on their relevance to groundwater quality and fluoride behavior in aquifer systems. The first category comprises the eight core physicochemical parameters, along with additional indicators, such as bicarbonate (HCO3−), sulfate (SO42−), calcium (Ca2+), magnesium (Mg2+), sodium (Na+), potassium (K+), and fluoride (F−), which reflect geochemical interactions that affect water quality. The second category comprises spatial features, latitude, longitude, and well depth, which account for locational heterogeneity and aquifer depth variability. The third category includes temporal and metadata features: year of sampling, block name, and WQI class label. The year variable enables modeling of temporal trends, while the block name, encoded categorically, captures local administrative differences. The WQI class label served as the target output for classification tasks. The dataset was labeled based on these WQI classifications (e.g., good, excellent), and model performance was evaluated using 5-fold cross-validation to assess accuracy, precision, recall, and F1 score. Before model training, the dataset underwent several preprocessing steps to ensure quality, consistency, and compatibility with machine learning algorithms. Continuous variables were standardized to have a mean of zero and a variance of one, enabling uniform feature scaling and improving model convergence across various algorithms. Categorical variables, including the block name and WQI class label, were one-hot encoded to convert categorical data into a numerical format suitable for machine learning (ML) models. For features with missing values constituting less than 2% of the dataset, median imputation was applied to maintain data integrity without introducing bias. The resulting preprocessed dataset formed the input matrix (X). At the same time, the output vector (y) varied depending on the modeling task: fluoride concentration was used as the input for regression tasks, and the WQI class label served as the target variable for classification tasks.
Before model training, we applied a rigorous data preparation pipeline to ensure consistency and minimize bias. First, we performed an initial quality check to remove duplicate records and detect outliers using the interquartile range rule. Features with missing values (<2% of all entries) were imputed using the median, thereby preserving a central tendency without skewing the distributions. All continuous inputs (e.g., pH, EC, DO, alkalinity, hardness, chloride, TDS, BOD, HCO3−, SO42−, Ca2+, Mg2+, Na+, K+, F−, well depth) were then standardized to zero mean and unit variance to harmonize scales and accelerate convergence of gradient-based algorithms. Categorical fields (block name, year, and WQI class label) were converted via one-hot encoding to create binary indicator variables, enabling tree and distance-based learners to effectively leverage locational and temporal metadata.
2.4.1. Logistic Regression (LR)
For binary and multi-class classification tasks, the statistical and machine learning approach known as logistic regression (LR) is employed. It utilizes the logistic (sigmoid) function to estimate probabilities, thereby representing the connection between a dependent variable and one or more independent variables. The sigmoid function is suitable for probability prediction, as it converts any real-valued input into a value between 0 and 1. Logistic Regression classifies data according to a decision threshold, usually 0.5, and produces the probability of the target class. It is widely used due to its simplicity, interpretability, and effectiveness in linearly separable data applications across various fields, including medicine, finance, and the social sciences.
where
is the probability of the positive class,
are the regression coefficients,
are the input features, and e is Euler’s number.
2.4.2. Decision Tree (DT)
A decision tree is a supervised learning technique that has been applied to tasks involving regression and classification. It is a rule-based model that creates a tree-like structure by dividing the dataset into subsets according to the value of the input attributes. To optimize information gain, the algorithm utilizes impurity measurements, such as the Gini Index or Entropy, to determine which feature at each internal node is most suitable for splitting the data. Recursively, the process continues until a stopping criterion, such as the maximum depth or the minimum number of samples per leaf, is met. Decision trees can efficiently handle both numerical and categorical data, making them easy to understand.
Entropy (for Information Gain):
where
is the probability of class
, and cis the number of classes. The best split is chosen by maximizing the Information Gain.
2.4.3. Random Forest
An ensemble learning method called Random Forest is applied to problems involving regression and classification. During training, it constructs numerous decision trees and aggregates their results to provide more reliable and accurate predictions. To provide variety among the trees, each one is trained using a different subset of the features and data. Either majority voting in classification or averaging the outcomes in regression yields the final forecast. This method enhances generality and lessens overfitting, which is prevalent in single decision trees. Random Forest is widely used due to its robustness, high accuracy, and ability to handle large datasets with missing values:
where
is the final prediction, N is the number of trees, and
represents the prediction of the i
th decision tree.
2.4.4. Support Vector Machine (SVM)
One practical supervised learning approach widely used for classification problems is the Support Vector Machine (SVM). It operates by determining the hyperplane that divides data points from multiple classes with the largest margin. This margin is the separation, expressed in terms of support vectors, between the hyperplane and the closest data points from each class. The goal of SVM is to increase this margin, thereby enhancing the model’s capacity to generalize to unseen data. SVM can handle both linear and non-linear classification problems using kernel functions, making it a versatile and practical approach, especially in high-dimensional spaces and complex datasets. The hyperplane Equation is as follows (8):
where w is the weight vector, X is the input vector, and b is the bias term. The optimization problem for SVM is Equation (9) subject to
.
SVM uses the kernel trick to map data into a higher-dimensional space for non-linearly separable data.
2.4.5. K-Nearest Neighbors (KNN)
K-Nearest Neighbors (KNN) is a non-parametric, instance-based learning algorithm used for classification and regression. A data point is classified according to the majority vote of its k closest neighbors in the feature space. Although various metrics can be employed, Euclidean distance is commonly used to quantify the distance between points. KNN does not make any assumptions about the underlying data distribution, making it simple and versatile. It is sensitive to the choice of *k* and the scaling of features. Despite its simplicity, KNN can be effective for pattern recognition and classification tasks.
2.4.6. Distance Metric (Euclidean Distance)
X and Y are two data points, and n is the number of features. The class is determined as follows:
where c is a class label, and 1 is an indicator function.
2.4.7. Naive Bayes (NB)
Naive Bayes is a fast and simple probabilistic classification algorithm based on Bayes’ Theorem. It assumes that the features are conditionally independent given the class label, which simplifies computation. Despite this strong independence assumption, Naive Bayes often performs well in real-world applications, especially with high-dimensional data. It calculates the posterior probability of each class given the input features and assigns the class with the highest probability. Naive Bayes is widely used in text classification, spam detection, and sentiment analysis due to its efficiency, scalability, and good performance with relatively small training data. Bayes’ Theorem is as follows:
where P(C∣X) is the probability of class C given features X, P(X∣C) is the likelihood of features given class C, P(C) is the prior probability of class C, and P(X) is the probability of features X. For Gaussian Nave Bayes (GNB), the likelihood follows a normal distribution:
where
and
are the mean and variance of feature Xi in class C.
3. Results
The findings of the above study followed the procedures under the APHA 2022 protocols, and the observed values were cross-verified against the WHO and BIS drinking water standards. The laboratory test results for different physicochemical parameters are presented in the box plot graphs shown in
Figure 3A–H. The box plots depict the range of parameters from minimum to maximum values. Around 32.0% of areas deviated from EC. Similarly, SameiGadia, Mancheswar Village, and Chakeisiani Area (all located in Ward 5) exhibited excessive hardness levels, attributed to natural geological processes, industrial discharge, and urban runoff, which led to pipe scaling and reduced soap effectiveness. 13.6% of areas deviated from the hardness standard. Parameters like chloride, TDS, BOD, and alkalinity remained within permissible limits.
3.1. Illustration of Water Quality
The assessment of water quality across the 21 selected wards was carried out using eight key physicochemical parameters: pH, DO, EC, Total Dissolved Solids (TDSs), Alkalinity, Hardness, Chloride, and Biochemical Oxygen Demand (BOD). Descriptive statistics for each parameter, including minimum, maximum, mean, and standard deviation values, were used to evaluate water quality across the study area, as discussed below.
3.1.1. pH
As shown in
Figure 3A, the maximum pH value was recorded in Ward No. 26 (Sranapalli) at 8.5, while the minimum was observed in Ward No. 20 (Sriram Nagar) at 5.6. All other surface water samples were found within the acceptable pH range of IS 10500 ranging from 6.5 to 8.5.
3.1.2. DO
The variation in DO levels across different wards was reported to show that Ward No. 20 (Tarini Nagar) had the highest recorded DO value at 8.4 mg/L. In comparison, the lowest value was noted in Ward No. 2 at 7.3 mg/L, as illustrated in
Figure 3B. According to IS 10500, the acceptable range for DO is 6.5 to 8 mg/L. Although Ward No. 20 initially showed slight deviation, it is now approaching the standard range, indicating a return to safe levels.
3.1.3. EC
As shown in
Figure 3C, the EC values of gravity-fed water across all study locations ranged from 67.7 to 449.9 µS/cm. These values were generally within the acceptable limits set by the WHO of 400 µS∙cm
−1, with exceptions observed in specific areas including Ward No. 5 (Satya Nagar &Mancheswar Village), Ward No. 10 (IDCO Colony, Mancheswar Industrial Estate, Votpada Village), Ward No. 14 (NiladriVihar S 5), Ward No. 16 (NALCO Nagar, Mayfair area), and Ward No. 20 (Nilachakra Nagar). The higher EC values observed in Wards 5, 10, 14, 16, and 20 indicate possible contamination from dissolved salts, which could originate from industrial effluents, natural groundwater salinity, or seepage of wastewater.
3.1.4. Alkalinity
It was reported that the alkalinity levels in the northern zone of Bhubaneswar, as illustrated in
Figure 3D, varied significantly, with the SPM Park area in Ward No. 17 showing the lowest level at 16 mg/L, and the Hanspal Village area in Ward No. 4 registering the highest at 128 mg/L. This variation was attributed to differences in the concentration of minerals, particularly carbonates and bicarbonates, present in the water across different wards. The above results for Alkalinity across the study area are within the permissible limit IS 10500.
3.1.5. Hardness
It was reported that water hardness in the study area was primarily attributed to elevated concentrations of alkaline earth ions in water, specifically calcium (Ca
2+) and magnesium (Mg
2+) [
49,
50]. According to
Figure 3E, the hardness levels ranged from 13.1 mg/L to 262.4 mg/L. It was noted that high hardness in the supply water appeared to be influenced by the proximity of certain areas to sewage drains.
3.1.6. Chloride
The study revealed that the chloride ion concentration in the supply water ranged from 21.3 mg/L to 87.0 mg/L, as depicted in
Figure 3F. It was observed that the Metro Home Area of Ward No. 6 had the lowest chloride concentration, while Hanspal Village in Ward No. 4 registered the highest. Nevertheless, all the wards assessed exhibited chloride levels within the permissible range. The authors stated that chloride (Cl
−) occurs naturally in the environment due to sources such as suspended salt particles, soil porosity and absorbency, residual food waste, and farm manures used in agricultural fields.
3.1.7. TDSs
It was observed that the Total Dissolved Solids (TDSs) values in the supply water ranged from 44.0 mg/L to 312.2 mg/L, as shown in
Figure 3G. The highest value was recorded in Ward No. 10, specifically in Votapada Village (312.2 mg/L). This elevated value was suggested to have resulted from higher ambient temperatures that facilitated weathering processes, enhanced ion exchange capacity, promoted desorption, and accelerated the dissolution of minerals. The increased temperature also appeared to elevate both pH and EC in the area. Since Ward No. 10 recorded the highest values for both pH and EC, the authors inferred a direct relationship between TDS and these parameters.
3.1.8. Biochemical Oxygen Demand
The study observed that the BOD (Biochemical Oxygen Demand) values ranged from 0.6 mg/L to 5.0 mg/L, as illustrated in
Figure 3H. It was reported that the highest BOD concentration was measured in Ward No. 3 (KananVihar Phase-1), while the lowest was recorded in Ward No. 6 (Pragati Vihar). The authors attributed the elevated BOD levels to increased biological activity, which they explained is often linked to warmer temperatures. They emphasized that BOD is a crucial parameter in stream pollution control and is particularly important for regulating organic load to maintain the required levels of DO. The rise in BOD, particularly in Ward No. 3, was again linked to elevated biological activity under warmer conditions.
3.1.9. Identification of Contaminated Areas
The study reported that deviations in key water quality parameters, namely pH, DO, EC, and hardness, were spatially mapped and are depicted in
Figure 4a–d. These maps highlight areas of significant environmental and public health concern. It was noted that the analysis was conducted using ArcMap 10.5 version software. According to the findings, 12.6% of the sampled areas exhibited pH values outside the acceptable limits defined by IS 10500. The authors identified that the exceedance in pH was primarily observed in regions such as Chirakhol Toil Slum, PrashantiVihar, and KananVihar Phase 1. These anomalies were attributed to sewage contamination and informal waste disposal practices. Similarly, high DO values were detected in localities including Sikharchandi Nagar, Adarsha Vihar, and NiladriVihar. These were explained as being the result of anthropogenic pressures, such as construction activities, sewage infiltration, and surface runoff, in urbanized regions. Overall, 21.4% of the areas were found to deviate from standard DO levels.
It was observed that the WQI for each ward in the northern zone of Bhubaneswar was calculated using the weighted arithmetic index method [
51]. The WQI value for Ward No. 1 is presented in
Table 2. According to the study, detailed water quality analysis results are tabulated in
Table 3, and
Table 4 contains the WQI values for all wards, based on predefined standards. According to these findings, illustrated in
Figure 5, the WQI values fell within a range indicating that the water quality was not only satisfactory but also suitable for drinking purposes. It is reported that 28.6% of the water samples fell within the ‘excellent quality’ category, characterized by WQI values ranging from 0 to 25, as shown in
Table 4. Additionally, 71.4% of the samples were categorized under ‘good quality’, with values ranging from 26 to 50. These results were interpreted as evidence that the water quality in Bhubaneswar has remained largely uncontaminated to date.
It was also noted that deviations in specific parameters, such as pH, EC, DO, and hardness, were mapped using ArcGIS and visualized in
Figure 4a–d. These anomalies were attributed to likely outcomes of aging pipeline infrastructure and corrosion within the distribution network, as reported by the Public Health Engineering Department (PHED) of Bhubaneswar.
Further, it was indicated that in the ArcGIS maps (
Figure 4a–d), red circular dots were used to mark the wards, where water quality parameter values surpassed critical thresholds: for instance, pH values outside the acceptable range of 6.5–8.5, EC values exceeding 300 µS/cm, and hardness levels greater than 200 ppm. Pink numeric labels adjacent to these dots represent the corresponding location serial numbers of the affected areas. It was observed that the deviations of parameters in some wards were confirmed to be linked to sewage contamination and issues within the plumbing systems. PHED was reported to have taken corrective steps by upgrading and modifying the pipeline infrastructure in the affected areas.
3.2. Graphical Presentation of Water Quality Index
The ward-wise WQI was illustrated in
Figure 5. It was observed that Wards 5 and 8 had WQI values of 48.7 and 45.2, respectively. Furthermore, they reported that Ward Numbers 1, 2, 17, 20, and 21 recorded WQI values below 25.0, indicating excellent water quality. The remaining wards were noted to fall within the range of 25.1 to 45.0, which corresponds to good water quality.
3.3. Principal Component Analysis
PCA was used to analyze the original monitoring data to reduce computational complexity and identify the influence of various parameters on water quality. The primary objective of PCA was to extract the key representative characteristics of the water environment into a set of independent variables known as principal components. PCA identified correlations among the geochemical data, contributing to the understanding of the depositional climate. In this study, PCA was performed on eight physicochemical water quality parameters across 26 wards of Bhubaneswar city.
As shown in
Table 4, the correlation coefficient matrix obtained using OriginPro 2023 software revealed a strong positive correlation between Total Dissolved Solids (TDSs) and Hardness (r = 0.8), commonly attributed to the presence of calcium and magnesium salts. EC and Hardness also showed a strong correlation (r ≈ 0.8), indicating that divalent ions largely influence the ionic strength in water. Moderate positive correlations were observed between pH and EC, pH and Alkalinity, as well as DO and Hardness. These relationships suggest that pH regulation is closely linked to the buffering effect of dissolved carbonates, while DO variability may be influenced by interactions with metal ions and biological activity. These inter-parameter dependencies reflect a complex interplay of hydrogeochemical and anthropogenic factors, highlighting the importance of integrated interpretation of water quality indicators.
Calcium and magnesium salts were found to account for the strong association observed between total dissolved solids (TDSs) and water hardness. A similar positive correlation exceeding 0.58 was noted among pH, conductivity, hardness, and alkalinity.
The scree plot presented in
Figure 6 illustrates that the slope became flatter after the third principal component, indicating diminishing variance contributions. The first principal component (PC1) accounted for 48.0% of the variance and showed strong positive loadings from pH (0.4), conductivity (0.5), hardness (0.5), and TDSs (0.5). The second principal component (PC2), which explained approximately 17.00% of the total variance, was primarily influenced by alkalinity (r = 0.6) and showed a negative correlation with DO (r = 0.7). The loadings of PC1 and PC2 indicated the dominance of ionic parameters, likely linked to the presence of calcium and magnesium salts in the water.
As detailed in
Table 5, PC1 contributed 48.0% of the total variation, with notable inputs from EC, hardness, and pH, while PC2 explained 17.00% and was dominated by alkalinity and DO. Chloride, TDS, and BOD made comparatively smaller contributions.
The scores plot in
Figure 7 identified four groups of wards. Group 1 lay negatively along PC1 and positively along PC2, whereas Group 4 appeared in the opposite quadrant. Group 2 displayed a strong positive association with both components, suggesting a significant ionic influence. Groups 3, 1, and 17 occupied the negative quadrant for both PC1 and PC2, indicating lower concentrations of metal ions. These spatial groupings and component loadings reflect underlying hydro-chemical interactions, supporting policy-oriented interventions as discussed in
Section 4.
Model performance was evaluated using 5-fold Stratified Cross-Validation (CV) to ensure balanced testing and result robustness. Metrics such as accuracy, precision, recall, and F1-score were computed for each fold and summarized with their mean and standard deviation. The validation process confirmed consistent model performance across all subsets (
Table 6). Among the classifiers tested, Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), k-Nearest Neighbors (KNN), and Naive Bayes (NB), DT and RF achieved the highest accuracy of 91.7%, with RF also leading in precision at 92.7%. SVM, KNN, and NB followed closely, with accuracies of around 89.6%, while LR recorded the lowest accuracy at 87.5%. These findings underscore the strength of tree-based models for water quality classification, offering practical tools for predictive assessment and environmental monitoring.
Hyperparameter Tuning and Function Selection
In
Table 7, the hyperparameter settings for all six classifiers were retained at their respective library defaults to establish a baseline for performance comparison. For Logistic Regression, an ℓ
2 penalty was applied with a regularization strength of C = 1.0, solved using the ‘lbfgs’ optimizer in multiclass mode, with max_iter set to 100 and a tolerance of 1 × 10
−4. The Decision Tree model operated using the Gini impurity criterion, with no depth restriction, a minimum of two samples required to split an internal node, and a minimum of one sample per leaf. The Random Forest ensemble consisted of 100 bootstrap-aggregated trees, each constructed using Gini impurity, with no depth limit, and automatic feature selection applied per split. The SVM classifier utilized a radial basis function (RBF) kernel (C = 1.0, γ = scale, tol = 1 × 10
−3). The K-Nearest Neighbors algorithm evaluated the five nearest neighbors using Euclidean distance (
p = 2) and employed a uniform voting strategy. Lastly, Gaussian Naïve Bayes was implemented with no class priors and a variance smoothing parameter of 1 × 10
−9. These configurations provided consistent, reproducible conditions for evaluating and benchmarking classification performance before any further hyperparameter tuning.
4. Discussion
The paper reported that the overall quality of tap water in Bhubaneswar was generally good, with most of the surveyed wards falling in the “good to excellent” range on the WQI. It stated that 28.6% of the areas had been assessed to have excellent water quality, while 71.4% were categorized as having good water quality. According to the findings, this indicated that a safe municipal supply system was generally in place. The variations in pH in Ward No. 26 and 20 were likely caused by factors such as leaching of organic matter, bacterial activity, and occasional use of fertilizers in gardens. Additionally, acidity levels could have resulted from elevated CO
2 due to decomposition of organic matter, adsorption of metal anions, and the presence of non-metallic compounds such as fluoride. Deviations in pH levels were known to irritate the skin and exacerbate health issues such as eczema and gastrointestinal distress [
52]. It is observed that the elevated pH levels in wards such as Chirakhol Toli Slum and Prashanti Vihar (ward 1) could be indicative of waste disposal issues or a reaction to alkaline buffering agents within the pipeline network [
53].
In addition, the elevated DO levels in Sikharchandi Nagar might suggest natural reaeration; however, in areas with aging pipes, such levels could also indicate organic contamination [
54]. It is suggested that factors such as geomorphological characteristics and anthropogenic activities contributed to pollution, which could potentially decrease the DO concentration to levels below those considered essential [
55]. It was further explained that particles like slits, clays, and sewage debris absorb sunlight, thereby increasing the water surface temperature and resulting in a reduction of DO levels. This decline in DO was reported to carry significant health implications for aquatic plants and animals, which depend on adequate levels of DO for survival and vital metabolic processes [
56].
The elevated EC levels in PHD Colony, Chakeisiani Area, and Satya Nagar (all from Ward 5) reflected the impact of industrial operations, urban development, and improper waste management, which introduced pollutants such as chloride, sulphate, and nitrate into water bodies that could cause cancer in the colon and rectum [
57]. It was reported that elevated EC levels in drinking water may have health implications, particularly due to excessive intake of dissolved minerals and salts, and prolonged consumption could increase the risk of hypertension and cardiovascular diseases. The study area shows a significant mineral deposit, water percolating through the soil could dissolve these minerals, thereby increasing alkalinity, and it is confirmed that alkalinity levels remained within the permissible limits set by IS 10500 across all surveyed locations [
58].
It is highlighted that excessive water hardness in wards such as Mancheswar Village (ward 5) necessitates water softening [
59]. The elevated hardness is most likely influenced by the leaching of calcium and magnesium from corroded pipeline materials, combined with subsurface geogenic contributions, such as the dissolution of carbonate minerals, including calcite and dolomite, present in the underlying geological formations [
60]. The study also noted that exposure to water with high chloride concentrations could have adverse effects on the skin, and chloride ions are highly polarizable in water, which can accelerate corrosion reactions [
61]. It was found that chloride ions degrade reinforced concrete (e.g., in bridges), contributing to structural ageing, and also cause corrosion in boiler systems by eroding pipes exposed to chloride-rich steam [
62].
Temperature-dependent processes, such as ion exchange and desorption, played a crucial role in increasing TDS levels and contributed to an unpleasant taste in drinking water. The study also highlighted that consumption of such pipe water with elevated TDS levels could lead to gastrointestinal discomfort, cardiac problems, and kidney stone formation [
63]. The PCA demonstrated marked positive correlations among key hydro-chemical parameters, especially TDS with hardness and EC with hardness [
64]. These relationships reflect the contributions of mineral dissolution, ion concentration resulting from human activities, and aging delivery systems that add ions to the supplied tap water [
65]. The widespread mapping in ArcGIS proved essential for visualizing contamination hotspots, facilitating the recognition of primary zones for action [
66]. This combined collaboration serves as a water-quality management system for metropolitan planners and town planners by providing reliable, scientifically based, and prioritized evidence for the need to improve water supply systems [
67].
Machine learning coupled with geo-spatial inputs improves urban tap water quality assessment by converting complex physicochemical data into accurate predictive models [
68]. Combining traditional methods (WQI, PCA) with supervised classifiers, the study overcomes static thresholds and fragmented monitoring, achieving over 91% accuracy in classifying water quality across 21 wards [
66]. This scalable, real-time framework supports data-driven decisions for the safeguarding of drinking water in rapidly urbanizing North Bhubaneswar [
69]. The findings were validated by the Public Health Engineering Department (PHED), Government of Odisha, which recommended addressing encroachment issues in various wards.
Although the study itself does not directly enhance water quality, it provides a thorough assessment of the evaluated water conditions, enabling stakeholders to formulate appropriate water management practices [
70]. These findings further justify protective measures, such as continuous monitoring, controlled governance of wastewater, and the maintenance of distribution systems, to ensure the safety of urban water supplies following rapid urbanization [
71]. Moreover, in the case study’s lab assessment, although analysis was conducted according to the APHA 2012 guidelines, sample processing protocols at the time referenced newer APHA editions, such as 2017 and 2022, as prospective references for future research aimed at meeting changing international benchmarks [
72,
73,
74].
4.1. Comparison and Implications
Studies conducted in metropolitan cities in India revealed higher WAWQI scores, which proves poorer water quality. On the other hand, the study area falls within the “good” to “excellent” range, which suggests that improved water quality management, addressing environmental issues, and addressing social factors are necessary to achieve a sustainable water supply [
75]. The findings have significant implications for decision-makers, environmentalists, academics, and other stakeholders, as the results underscore the need for interventions to address localized contamination hotspots and enhance wastewater treatment efficiency, thereby ensuring an adequate water supply infrastructure [
76].
Furthermore, the strong correlation (r = 0.8) between TDS and hardness, as well as their dominant influence in PC1 (48.0% variance), highlights the need for targeted interventions. These parameters, often influenced by calcium and magnesium salts, contribute to pipeline scaling and degradation of water quality. Policymakers should consider regular monitoring of ionic concentrations, implementing localized softening units in high-risk wards, and prioritizing infrastructure audits based on PCA-derived ward groupings.
4.2. Study Limitations
Despite the comprehensive nature of this study, certain limitations provide opportunities for further research. Firstly, the analysis was restricted to the North Zone of Bhubaneswar, which may not fully capture the spatial variability of water quality across the entire city. Extending the assessment to include the Southeast and Southwest zones would enable a more holistic understanding of municipal water supply conditions and help identify zone-specific vulnerabilities. Secondly, the scope was limited to physicochemical parameters, and a detailed microbiological evaluation focusing on bacteriological activity and potential microbial contamination within the water supply network is recommended to complement the current findings [
77]. Finally, incorporating advanced statistical approaches, such as multivariate analysis and time-series modelling, could improve data interpretation by uncovering deeper spatial-temporal patterns in water quality variation, thereby supporting more precise and informed decision-making [
78].
4.3. Recommendation and Suggestion
The enhancement of waste management practices through segregation, recycling, and proper disposal of industrial and hazardous waste, as well as the upgradation of water treatment facilities with modern technologies, plays a pivotal role. Establishing real-time water quality monitoring systems, strengthening enforcement of water pollution regulations, and conducting regular audits and inspections are also essential. A detailed study is required of specific pollutants, long-term water quality monitoring, and the development of advanced assessment methods integrated with statistical models and machine learning algorithms. Additionally, research priorities should focus on pollutant-specific analysis, health risk assessment, and evaluation of water management policies to inform evidence-based decision-making and ensure a safe and sustainable water supply in Bhubaneswar. Based on PCA findings, introducing ion-specific softening modules in municipal treatment plants and tracking variations in mineral load could further enhance water quality control.
4.4. Conclusion
The assessment of tap water quality revealed that most wards exhibited favorable water quality, ranging from good to excellent classifications. The WQI calculations showed that 28.6% of the regions had excellent water quality, while 71.4% fell under the good water quality range. The analysis of individual water quality parameters revealed localized variations, with specific areas exhibiting high pH values, elevated DO levels, or high hardness levels. These findings provide a comprehensive understanding of the current state of tap water quality in the study area, serving as a valuable resource for stakeholders and informing future water management strategies.
This study highlights the strength of combining geospatial tools and predictive modeling for spatial water quality assessment, examining the complex relationships between climate variability and hydrological responses. Based on these results, we recommend that policymakers and stakeholders take initiatives to mitigate the adverse impacts of water pollution in the prone areas, which will promote sustainable water management and ecosystem resilience.