1. Introduction
Water management influences many aspects of our life, including the environment, food production, irrigation, energy generation, etc. [
1]. One of the world’s most pressing challenges is the scarcity of safe water, which is quickly dwindling due to climate change, contamination, and pollution. With the explosive rise in the world’s population, the necessity for efficient and smart water resource monitoring methods is becoming particularly crucial. Smart water monitoring is described as applying various computational approaches to offer users appropriate tools and information for water network supervision, control, analysis, and optimization [
2]. Several water management solutions have been developed, implementing the most recent advances in information technology to address this issue, all of which are costly and energyintensive. Recently, the quest for a smart water management system is gaining traction with the birth of the Internet of Things (IoT) [
3,
4,
5]. IoT technology has risen to prominence in a number of vital sectors in recent years, owing to its enhanced capabilities and competitive benefits [
6,
7,
8]. The IoT allows the gathering and analyzing of data in its environment, thus offering intelligent applications in a wide range of areas, notably for water management. The IoT, in this context, refers to a network of sensing devices that gather and monitor water data, which are then transmitted to computing systems for analysis. IoTbased water management systems are lowcost and energyefficient solutions that can be easily expanded while allowing effective remote monitoring and control [
4].
In this context, an IoTcentered solution is extremely advantageous since it enables the control of water quality and the optimization of safe water usage through the use of intelligent corrective actions and policies. However, one possible research direction for additional investigation to improve the effectiveness of this solution is to investigate how to deal with uncertainty when confronted with inaccurate or erroneous water data. Uncertainty is a pervasive feature in realworld scenarios and significantly impacts the quality of the information we can gather from data [
9]. In many applications, such as water resource management, the presence of uncertainty can lead to incorrect decisions or suboptimal performance. In fact, multiple sources of uncertainty exist in water environments, which can incorporate bias and inaccuracy into our analysis and decision making if not appropriately addressed. According to [
10], various factors contribute to the uncertainty in water data, such as pressure levels, degree of leakage, imprecise calibration of monitoring equipment, and the uncertainties associated with modeling complex water systems. Access to water reserves and flows is frequently challenging and incredibly unpredictable over time and space. Rivers flowing through vegetation or beneath the ice, water moving through porous soil structures and rock fissures, and isolated rainstorms from thunderclouds are just a few examples of these issues. In addition, uncertainty also arises from quantification issues related to errors in water sampling procedures, chemical and biological analyses, water quality indicators, and the assessment of the state of water zones [
11]. To ensure reliable and accurate findings, addressing the uncertainty of water data is crucial. This can prevent parameter bias, remove irrelevant data, and enhance water model performance evaluation [
10]. Hence, when developing intelligent systems for applications related to water management, it is essential to incorporate methods that can effectively handle and manage the uncertainty of the data.
To address this challenge, we proposed the use of uncertain knowledge graph embedding (UKGE) techniques. These extend the traditional knowledge graph embedding methods by modeling the uncertainty of the data. By incorporating uncertainty into the knowledge graph embedding, we can make more informed decisions by taking into account the uncertainty in the data. For example, in water resource management, UKGE can be used to detect anomalies in the water quality that may be difficult to detect using traditional methods. Additionally, UKGE can improve the performance of downstream tasks, such as classification, clustering, anomaly detection, and link prediction.
In this study, we proposed combining network representation learning with uncertainty handling methods to ensure a rich modeling and efficient management of the water environment. The main contributions of the proposed approach include the following:
An uncertaintyaware modeling of the smart water environment that quantifies and incorporates uncertainty factors into the water information network (WIN);
An uncertain embedding of the WIN combining probabilistic and network representation learning (NRL) models to ensure the learning and classification of representations of water information entities under uncertainty of the monitored data;
An uncertaintyaware decision mechanism that applies the evidence theory, and that consists of querying the uncertain WIN to select the suitable management actions for each class of affected water zones.
The remainder of this paper is organized as follows: In
Section 2, we review the current IoT solutions to deal with water management issues. Then in
Section 2.2, we briefly present
SmartWater, our previous sensor cloudbased framework. In
Section 3, we discuss the impact of the uncertainty factors on the effectiveness of water management operations.
Section 3.1 presents an uncertaintyaware modeling and representation learning of the water information network. It also presents a decision mechanism that exploits the learned representations in triggering appropriate water management plans.
Section 4 provides extensive experiments on the proposed approach. The last section is devoted to the conclusion and future work.
3. Uncertainty Handling in Water Environments
Despite the dramatically increasing number of water monitoring approaches, most ignore the uncertainty factors (e.g., pressure level, leakage degree, imprecise calibration of monitoring equipment, uncertainties associated with the modeling of complex water systems, inaccurate sensing, incorrect or incorrect or missing measurements, etc.). Such uncertainty must be considered during the water monitoring and network embedding process. Uncertainty is a natural feature of many forms of knowledge. In realworld uncertain knowledge graphs such as ConceptNet, NELL, and ProBase, relations and facts are associated with a confidence score [
30]. Currently, there are few alternatives to capture uncertainty information with knowledge graph embeddings [
28,
29]. To achieve the goal of water monitoring under uncertain water zones’ contexts, it is important to encode additional information (e.g., truth degrees of water measurements) to preserve uncertainty. Probabilistic models have gained widespread acceptance in different domains, particularly recommender systems [
31,
32,
33,
34,
35]. Probabilistic knowledge graph embedding has also been applied in some domainindependent approaches [
28,
29]. However, uncertain and probabilistic embedding have not yet been exploited in the field of water monitoring. Therefore, incorporating such models (e.g., latent probabilistic models, latent Dirichlet allocation, probabilistic matrix factorization, probability relevance, and probability ranking principles) to water monitoring systems would be a promising approach.
The present work aims at improving our smart water monitoring system by incorporating uncertainty into the monitoring process. An uncertain water information network, also called UWIN (see
Section 3.1), will represent knowledge as a set of facts denoting the contextual relations defined over water entities. The UWIN will also contain uncertain facts and will provide a confidence score, along with each contextual relation between water entities and sensors. This approach considers the UWIN as a set of probabilistic facts. Each relation between two entities in UWIN (e.g., reservoir, sensor, pipeline, etc.) is represented with a probability value. The probabilistic construction of the UWIN effectively addresses the uncertainty of water zones’ information, allowing for a more accurate prediction of their states. We will adopt a probabilistic graph embedding method to approximate these probabilities and provide recommendations for the appropriate water management actions. In this work, we define a model for uncertain knowledge graph embedding to preserve structural relationship information and uncertainty information of contextual relations between water entities in the embedding space. The UWIN model learns the embeddings according to the truth degrees of uncertain contextual relations. A model for uncertain knowledge graph embedding is defined in this work to preserve both structural relationship information and uncertainty information of contextual relations between water entities in the embedding space. The UWIN model learns the embeddings according to the truth degrees of uncertain contextual relations, such as water measurements. In this case, the prediction step consists of forecasting the water quality probability to determine suitable recommendations for actions.
3.1. Modeling of Uncertain Embedding of the Water Information Network
3.1.1. Uncertainty Quantification
Water management systems often are subject to uncertainties. Several uncertainty factors may affect the decision quality in a water monitoring system. For example, the uncertainty of input data may be caused by inaccurate measurements, missing values, spatial interpolations, temporal aggregation, assumptions in boundary and initial conditions; or (ii) parameters uncertainty, natural variability, lack/inadequacy of observations, calibration techniques, etc. Monitoring instruments and sensors may also be subject to failures, calibration errors, or unstable behavior, which may affect the monitoring records. That includes the inaccurate measurement of water temperature or turbidity, which is used to determine the clarity of the water, TDO (Total Dissolved Oxygen) and pollution levels, errors in measuring pump rate and pressure, etc. Other important sources of uncertainty concern the insufficient number and geographical spread of sensors, the sampling (i.e., sampling location and frequency), and analytical uncertainties. Hence, an incomplete understanding of the water zones’ states will lead to inappropriate decisions.
The above uncertainty factors and sources must be considered when constructing the water information network (see
Section 2.2) and updating it after each monitoring time frame, thus treating it as an uncertain knowledge graph.
The uncertainty related to parameters in the
$WKG$ has two forms: aleatory and epistemic. The first refers to a random event’s natural variability, while the second depicts a lack of knowledge. In this paper, uncertainty related to parameters is propagated using belief function theory [
36,
37]. This theory is effective for modeling and processing aleatory and epistemic uncertainty in a very natural way [
38]. To better understand the mechanism of the evidence theory, we will start by explaining the core concepts of this theory, namely the basic belief assignment, uncertain parameters, and propagation of the parameter uncertainty. The main advantages of evidence theory include its ability to handle both aleatory and epistemic uncertainty, its ability to propagate uncertainty in a rigorous and efficient manner, and its ability to incorporate expert knowledge into the uncertainty modeling process. Additionally, evidence theory can provide a measure of the reliability of the results obtained, allowing decision makers to evaluate the level of confidence in the decisionmaking process. Overall, the use of evidence theory can lead to more accurate and robust decision making in the face of uncertainty.
Definition 2. (Basic belief assignment (BBA)) Let $\mathsf{\Theta}=\{{\mathcal{C}}_{1},\dots ,{\mathcal{C}}_{n}\}$ be a finite set of mutually exclusive and exhaustive classes of water quality, called the frame of discernment. A BBA is a function that maps each proposition $\mathcal{A}$ from ${2}^{\mathsf{\Theta}}\to [0,1]$ and verifies that the mapping $m\left(\mathcal{A}\right)\ge 0$, $m(\varnothing )=0$, and ${\sum}_{\mathcal{A}\in \mathsf{\Theta}}m\left(\mathcal{A}\right)=1$.
Definition 3. (Uncertain parameters) Epistemic parameters are bounded in a vector $e\in {R}^{n}$. ${e}_{i}(i\in [1,\dots ,n])\to [{e}_{i}^{L},{e}_{i}^{U}]$ having a BPA structure defined as $[{e}_{1}^{L},{e}_{1}^{U}]/{m}_{1},\dots ,[{e}_{n}^{L},{e}_{n}^{U}]/{m}_{n}$. Aleatory parameters ${a}_{j}(j\in [1,2,\dots ,m])$ are bounded in a random vector ${a}_{j}\in {R}^{m}$ with a normal probability distribution: $a\sim (\mu ,\sigma )$, where μ is the mean and σ is the standard deviation.
The belief function theory only considers an interval with an associated mass as input. Therefore, aleatory parameters are transformed into intervals with associated mass values [
$\mu \xi \sigma $,
$\mu +\xi \sigma $]. Then, these intervals are discretized into N subintervals [
${a}_{i}^{L},{a}_{i}^{U}$], where
$m\left({a}_{i}\right)={\int}_{{a}_{i}^{L}}^{{a}_{i}^{U}}f\left(x\right)dx$ and
$f\left(x\right)$ is the probability density distribution function (pdf) of
x depicted by Equation (
1).
After computing the BPA structures for the uncertain parameters of the $WKG$, they will be integrated into a joint structure, and computed as a Cartesian product ${c}_{ij}={a}_{i}\times {e}_{j}$. The BPA of ${c}_{ij}$ values are determined according to the following equation, $m\left({c}_{ij}\right)=m\left({a}_{i}\right)\times m\left({e}_{j}\right)$. The responses of the $WKG$ model are estimated as follows $[{Y}_{min},{Y}_{max}]=[mi{n}_{x\in {c}_{ij}}f\left(X\right),ma{x}_{x\in {c}_{ij}}f\left(X\right)]$.
3.1.2. Water Network Modeling
A first step towards the efficient management of water zones is the accurate monitoring of their state. This task must be preceded by an explicit representation of each water zone’s elements. However, the complexity of the water network coupled with the deviations of sensing objects makes smart monitoring a challenging task. Moreover, sensors may provide incorrect, inaccurate, or incomplete monitoring data, adding a new uncertainty factor regarding the water zones’ state. Seen as an uncertain information network, the present work aims to endow water monitoring systems with uncertaintyhandling capabilities. We first model the water information network as an uncertain knowledge graph to achieve this goal. Leading companies have successfully adopted knowledge graph technology (e.g., Facebook, Amazon, Yahoo, etc.), improving service consumers’ quality of experience [
39]. However, this new kind of knowledge base does not still support uncertain knowledge, as the multirelational and valid facts represent semantic modeling of its elements. To solve these issues in the context of smart water monitoring, each relation and feature in the water information network is characterized by a set of values denoting its truth degree. Entities such as water stations, sensors, and management policies are key components of water zones. However, some of them may be characterized by inaccurate information, which leads to a lack of understanding of the water zones’ state.
Definition 4. An Uncertain Water Information Network is a heterogeneous graph structure $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{F},{\mathcal{D}}^{+},\mathcal{P})$, where nodes in $\mathcal{V}=<{\mathcal{V}}_{s},{\mathcal{V}}_{c},{\mathcal{V}}_{f},{\mathcal{V}}_{z}>$ is the set of entities in a water zone (sensors, anomalies, management policies), edges in $\mathcal{E}$ denote the relations between the water entities, and the set $\mathcal{F}$ represents the features characterizing water entities. ${\mathcal{D}}^{+}=\left\{({e}_{i},r,{e}_{j})\right\}$ is the set of weighted/uncertain facts (triples) in $\mathcal{G}$. Each of these is a 4tuple $f=({v}_{i},r,{v}_{j},l)$, where the heads and tails ${v}_{i},{v}_{j}\in \mathcal{V}$ correspond to the water network entities (e.g., sensors, anomalies, monitoring hubs, distribution pipelines, management policies), $r\in \mathcal{E}$ is a relation between ${v}_{i}$ and ${v}_{j}$, and $l={P}_{ij}$ is the confidence score (truth degree) denoting the probability that the relation between ${v}_{i}$ and ${v}_{j}$ is valid and exists in the UWIN. A confidence score is a value $p\in \mathcal{P}$, where $0\le p\le 1$. A relation $r:{v}_{i}\stackrel{r}{\to}{v}_{j}\in \mathcal{E}$ in the WKG is a typed link (e.g., Monitor, ManagedBy, Trigger) between entities ${v}_{i}$ and ${v}_{j}$.
In the present work, uncertainty is handled at two levels. At the monitoring level, the collected data could be inaccurate or incorrect (e.g., a range of observed behaviors in a pump station), which requires computing the probability that an observation is true. At the water information network level, a fact’s validity has a truth degree, also called a confidence score. Taking the example of the fact
<Pollution, ManagedBy, SedimentRemoval>, a high confidence score of this triple (
$\uparrow 1$) means a high probability of triggering a sediment removal action in response to detected pollution in a water zone. Contrariwise, a low confidence score (
$\downarrow 0$) recommends excluding the sediment removal action from a water management plan. However, the structure of the UWIN at a given time depends on the truth of monitored data. For example, several pH scales could be observed in one water zone (e.g., (7:
pure, 10:
detergent, 12:
bleach)). In such a situation, we have three possible worlds for the UWIN (see
Figure 1). In fact, the first scale returned by sensors reflects a pure water state, while the two other scales require triggering a water management plan.
A possible world of a water information network
$\mathcal{G}$ is a deterministic graph
$\mathcal{W}=(\mathcal{V},{\mathcal{E}}_{w})$, where
${\mathcal{E}}_{w}\subseteq \mathcal{E}$. Hence, given a water zone’s state, the corresponding possible world
$\mathcal{W}$ is defined by the following probability:
Taking the example of the UWIN in
Figure 1, the probability of
$\mathcal{W}$ is computed as follows:
$\mathcal{W}=\{({e}_{1},{a}_{1}),({e}_{1},{a}_{2}),({e}_{2},{a}_{2}),({e}_{3},{a}_{2})\}$ with probability
$P\left(\mathcal{W}\right)={P}_{{e}_{1}{a}_{1}}{P}_{{e}_{1}{a}_{2}}{P}_{{e}_{2}{a}_{2}}{P}_{{e}_{3}{a}_{2}}(1{P}_{{a}_{1}{e}_{2}})(1{P}_{{e}_{1}{a}_{2}})=0.81\times 0.67\times 0.57\times 0.43\times 0.33\times 0.33=0.01448$.
To identify the correct triggering situations and to ensure accurate querying of the UWIN, we propose a threestep process that consists of (1) reasoning over the uncertain monitoring data and (2) embedding uncertain facts in the UWIN, and finally, based on a classification of water zones, (3) mapping the most likely observations and facts in the embedding vector space to the suitable corrective measures.
The water information network is first populated, then updated, by considering the new features of each water zone. The updated WIN in
Figure 2 shows that the probability values of different features (KPI metrics) are represented by green nodes, while the weighted relations represent the probability associated with each KPI feature, such as pressure and pH. The representation of highly uncertain water environments will facilitate and accelerate the selection of corrective actions. That is achieved by adopting an uncertain classification of the WIN nodes with similar features/states (e.g., water zones with poor quality), as we will demonstrate in the next section.
Table 2 summarizes the basic symbols and notations used in the rest of this paper.
3.2. Uncertain Embedding of the Water Network
In our previous work, we proposed an embedding graph model that reduces the complexity of querying the water information network. This task consists of first locating the captured events (e.g., pollution, leakage, pressure loss), then evaluating and selecting the suitable management policy. The proposed embedding model maps the water information network into a set of vectors, each denoting a learned representation of waterrelated entities. The ones with similar features (e.g., reservoirs containing lowquality water) are mapped closer and classified together. However, the previous embedding model deals with valid facts only, which means it cannot handle uncertain facts (e.g., <Pollution, ManagedBy, SedimentRemoval, 0.661>) or estimate the confidence of unseen facts, i.e., latent relations.
Definition 5. (Uncertain embedding) Given a water information network $\mathcal{G}$, the uncertain embedding consists of encoding each entity $v\in \mathcal{V}$ and relation $r\in \mathcal{E}$ into a lowdimensional vector space while preserving not only the structural graph information, but also the confidence scores of the different relations. The uncertain embedding also aims at predicting the confidence score of latent connections between entities (e.g., <PressureLoss, ManagedBy, RestorePressure, ?>). Based on that, the proximity among the water network’s entities is preserved in the original UWIN.
Using a linear regression model, Equation (
3) computes the vector representation
${v}_{i}$ of a data point
${d}_{i}$.
${f}_{i}$ denotes the feature vector of
${d}_{i}$,
$W\in {\mathbb{R}}^{m\times k}$ is the weight matrix to be learned,
$\lambda $ is a regularization parameter, and
$\left\right\xb7{\left\right}_{2}$ is the L2norm.
Equation (
4) computes the probability
$\mathcal{P}\left({e}_{ij}\right)$ of an edge
${e}_{ij}$ being present between nodes
${n}_{i}$ and
${n}_{j}$.
${w}_{ij}$ denotes the weight of the edge, and
${\gamma}_{0}$ and
${\gamma}_{1}$ are hyperparameters to be learned.
Equation (
5) computes the uncertainty
$\mathcal{U}\left({e}_{ij}\right)$ of an edge
${e}_{ij}$ in the uncertain information network
$\mathcal{G}$.
${w}_{ij}$ denotes the weight of the edge, and
${\gamma}_{2}$ and
${\gamma}_{3}$ are hyperparameters to be learned. The uncertainty is modeled as a logistic function of the edge weight.
The uncertain knowledge graph embedding (UKGE) method assigns a probability distribution to each entity and relationship in the knowledge graph, indicating the uncertainty of their actual embedding in the latent space. In Algorithm 1, this method is used to infer additional knowledge, such as latent connections, by generating a set of probability values that reflect the probabilistic distribution of the water network entities and their relationships. The probabilistic technique was chosen due to its widespread use in handling incomplete or uncertain data, as demonstrated in [
10]. The following arguments justify our decision to use this approach:
Firstly, it can help in quantifying the degree of uncertainty associated with the data collected from various sensors in the network. This can enable decision makers to have a more accurate understanding of the reliability of the data and, consequently, make more informed decisions.
Secondly, probabilistic techniques can enable the representation of complex dependencies and correlations between the different factors that contribute to the uncertainty in the water zone data. This can help in building more accurate models that can better capture the underlying dynamics of the system and, in turn, improve the decisionmaking process.
Finally, probabilistic techniques can provide a principled way of combining different sources of information, including historical data and expert knowledge, to arrive at a more comprehensive and robust assessment of the uncertainties in the water zones. This can lead to betterinformed decisions that take into account a wide range of factors and sources of uncertainty.
Algorithm 1: Uncertain knowledge graph embedding for water quality management 
Require: Water Quality Dataset $D={d}_{1},{d}_{2},\dots ,{d}_{n}$ Domain ontology $\mathcal{O}$ Distance metric $dist$ Number of dimensions k Hyperparameters: $\alpha $, $\beta $, $\gamma $ Ensure: Uncertain Water Information Network $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{P})$
 1:
Initialize node set $\mathcal{V}=\left\{\right\}$  2:
Initialize edge set $\mathcal{E}=\left\{\right\}$  3:
Initialize probability set $\mathcal{P}=\left\{\right\}$  4:
for ${d}_{i}\in D$do  5:
Extract features ${f}_{i}$ from ${d}_{i}$ using ontology $\mathcal{O}$  6:
Compute the vector representation ${v}_{i}$ of ${d}_{i}$ using Equation ( 3)  7:
Create a node ${n}_{i}$ in $\mathcal{V}$ with attributes ${f}_{i}$ and embedding ${v}_{i}$  8:
for ${n}_{j}\in \mathcal{V}$ do  9:
Compute the distance ${d}_{ij}=dist({v}_{i},{v}_{j})$ between node ${n}_{i}$ and ${n}_{j}$  10:
if ${d}_{ij}\le \alpha $ then  11:
Create an edge ${e}_{ij}$ between ${n}_{i}$ and ${n}_{j}$ with weight ${d}_{ij}$  12:
Set the probability $\mathcal{P}\left({e}_{ij}\right)$ to $\beta $ using Equation ( 4)  13:
end if  14:
end for  15:
end for  16:
for ${e}_{ij}\in \mathcal{E}$do  17:
Compute the uncertainty $\mathcal{U}\left({e}_{ij}\right)$ using Equation ( 5)  18:
Set the probability $\mathcal{P}\left({e}_{ij}\right)$ to $\gamma \xb7\mathcal{U}\left({e}_{ij}\right)+(1\gamma )\xb7\mathcal{P}\left({e}_{ij}\right)$  19:
end for

3.3. UncertaintyAware Decision Making for Water Management
In this section, we define an algorithm for querying the uncertain water information network to locate the affected water entities (e.g., lowquality reservoirs) and determine the most relevant management plan. The decision process is conducted under uncertainty of the monitored data and the learned representations, particularly the candidate management policies. This uncertainty varies depending on the water’s operational parameters (e.g., pH, turbidity, dissolved oxygen, rainfall, organic carbon, chemical dosage, flow rate, conductivity, disinfectant residual, and hydraulic pressure).
The example in
Table 3 depicts a set of candidate management plans with their confidence scores. These are computed based on the uncertainty degrees quantified after the monitoring phase. For example, the
pressure loss could be resolved by
flushing or
disinfecting the concerned water zone. Since flushing has a higher confidence (0.91) than disinfection (0.83), it will be selected by Algorithm 2.
Algorithm 2: Smart water decision making 
 1:
Input: $\mathcal{W}$—Uncertain Water network, ${L}_{p}$—captured events.  2:
Output: P—water management plan.  3:
Begin  4:
$P\leftarrow \u2300$  5:
for each $e\in {L}_{p}$do  6:
Locate e in $\mathcal{W}$  7:
for each action $a\in Context\left(e\right)$do ▹ Obtain management actions for the affected water entity (event e)  8:
if $(e,managedBy,a)\in \mathcal{W}$then▹ Check the existence of the management action a in $\mathcal{W}$  9:
${l}_{ea}\leftarrow Confidence(e,managedBy,a)$  10:
$P\left[e\right]\leftarrow P\left[e\right]\cup (a,{l}_{ea})$ ▹ Save corrective measure a for detected event e  11:
end if  12:
end for  13:
Sort $P\left[e\right]$ ▹ Sort candidate actions for event e according to their confidence score.  14:
end for  15:
Return P ▹ Return water management plan with several alternatives

Algorithm 2 takes, as input, a set ${L}_{p}$ of captured deviations (e.g., pressure loss, pollution), in addition to the uncertain water information network $\mathcal{W}$. The output is a set of actions denoting the water management plan with the highest confidence score. Each entity may be labeled with one or more events (e.g., pressure loss, chlorination, low nitrites level). Labeling waterrelated entities in the WIN allows arranging into groups of water zones that share similar captured changes. This classification enables smart management at the class level rather than triggering a management plan for each separate water zone.
For each node ($e\in {L}_{p}$) denoting the captured events in the water environment, Algorithm 2 starts by locating its connected actions ($Context\left(e\right)$), which represent the corrective measures to deal with e (line 6). Then, for each action a, the algorithm checks the existence of a valid triple in the possible world $\mathcal{W}\subseteq \mathcal{G}$ (line 8). This step is essential, as a triple’s confidence score reflects its ability to solve the captured event e (line 9). In this case, the confidence score keeps or excludes a candidate management action (line 10). The event processing ends with the saving (line 10) and sorting (line 12) of the candidate’s actions. This routine is repeated for each captured event (line 5). It should be noted that the processed event concerns at least one water entity or a group, i.e., class, of entities that encounter the same deviation.
The complexity of Algorithm 2 mainly depends on the number of affected zones, i.e., captured events (${L}_{p}$), and the UWIN size, i.e., number of triples ($\left\mathcal{W}\right$). The cost of locating those events and determining each one’s candidate actions takes $\mathcal{O}\left(\right{L}_{p}.N\left(e\right)\left\right)$, where $N\left(e\right)$ is the context of an event e. For each potential management action $a\in Context\left(e\right)$, Algorithm 2 checks the existence of a valid triple relating an occurring event e and the action a. After sorting the candidate actions, this operation takes $\mathcal{O}\left(\rightN\left(e\right).P\left\right)$. The whole time complexity is in $\mathcal{O}\left(\right{L}_{p}.N\left(e\right).P\left\right)$, and could be simplified to $\mathcal{O}\left(\right{L}_{p}{}^{2}.\leftN\left(e\right)\right)$, since the set P reflects the number of captured events.
4. Experiments
This section provides a detailed description of the data used in this study and the experimental setup. This includes information on the data sources, the preprocessing steps applied, and the evaluation metrics used to assess the performance of the proposed approach. This section also presents the study’s findings, including the impact of confidence levels on the accuracy of water zone classification. It provides a visualization of water zones embedding, which can aid in decisionmaking related to water management.
In this study, we developed the solution to encode the whole water management process (implementation source code and configuration information are available at
https://github.com/msellamiTN/ukgesmartwater2022, accessed on 7 May 2023). We used the TensorFlow [
40] and scikitlearn libraries [
41] to encode the entire water management process. The tdistributed stochastic neighbor embedding library (tSNE) [
42] was used to project and visualize the water environment data and reduce their dimensionality.
4.1. Dataset and Experimental Protocol
We utilized a publicly available dataset called “Indian water quality data” that encompasses historical water quality information from specific locations in India [
43]. This dataset includes measurements of pollutants, which are recorded as average values over a certain period. The data were sourced from official websites maintained by the Indian government. The physicochemical characteristics that describe each sample in the dataset are as follows:
Temperature: The temperature of water samples can affect various physical and chemical properties, such as the density, viscosity, and solubility of different substances.
pH: The pH level of water samples indicates their acidity or alkalinity, which can affect the chemical reactions and the behavior of different substances in water. The pH scale ranges from 0 to 14, with 7 being considered neutral, below 7 acidic, and above 7 alkaline or basic.
Conductivity: Conductivity is a measure of the ability of water to conduct electric current, which is influenced by the presence of dissolved ions or salts.
Dissolved oxygen (DO): DO is the amount of oxygen dissolved in water, which is critical for the survival of aquatic organisms and the health of aquatic ecosystems.
Biological oxygen demand (BOD): The amount of oxygen required by microorganisms to break down organic matter in the water sample, measured in milligrams per liter (mg/L).
Nitrate (NI): The concentration of nitrate ions in the water sample, usually measured in milligrams per liter (mg/L).
Fecal coliforms (FC): The presence or concentration of fecal coliform bacteria in the water sample, often used to indicate fecal contamination and potential health risks.
Total coliforms (TC): The presence or concentration of total coliform bacteria in the water sample, including fecal and nonfecal coliforms.
As the dataset lacked information on triggering events and their accompanying circumstances, the Water Quality Index (WQI) was computed for each sample using Equation (
6) and used to categorize water samples. The WQI is computed as the weighted sum of the quality rating scale of the parameters, where the weights are determined by the unit weight of each parameter, calculated using Equations (
6). Here,
N represents the total number of parameters used to calculate the WQI, and
$wj$ is the unit weight of the parameters used [
24,
44].
4.2. Experimental Results
To examine the performance of our proposed approach, we performed various experiments, which are mainly related to the effect of uncertainty. In the first experiment, we studied how confidence levels affect the accuracy of water zones’ classification and, subsequently, the selection of water management policies. In the second experiment, we analyzed the effect of uncertainty in high and lowconfidence settings to uncover all unclassified water areas. This allowed us to gain a deeper understanding of the significance of accounting for uncertainty in different scenarios to improve the quality of water area classification.
4.2.1. Impact of Confidence on the Accuracy of Water Zones’ Classification
In these experiments, we studied the impact of varying the threshold between 0.6 and 0.8 on the accuracy of the water zones’ embedding classification.
Figure 3 shows the confusion matrices of the two classifiers, SVM and RF, according to the four classes (excellent, good, poor, and very poor).
From
Figure 3, we can see that using high confidence UKGE improves the classification performance of all classifiers by at least 7%. These findings highlight the importance of uncertainty in achieving accurate water zone classification based on sensor data. Indeed, the consideration of uncertain knowledge can help in the learning of appropriate water information network representations. UKGElearned embeddings effectively capture uncertain information and constantly outperform the SVM classifier under high and low uncertainty scores, yielding promising outcomes with the RF classifier.
Figure 4 demonstrates how confidence impacts water classification performance, notably for the SVM classifier, which experiences an 8% decrease in accuracy at low confidence, probably resulting in unclassified water zones. The RF classifier, on the other hand, is less affected by low confidence, with just a 2% decrease in accuracy. This emphasizes the significance of monitoring data accuracy in the water classification process and establishing an appropriate confidence threshold depending on the chosen classifier to ensure feasible management policies.
Furthermore, the results presented in
Figure 5 imply the effectiveness of the proposed approach for classifying uncertain water zones, particularly those with very poor quality. This is demonstrated by the meaningful increase in classification accuracy from 0% to 100% when high confidence scores are considered. On the other hand, when the monitoring data are not certain (i.e., low confidence score), the embedding model may fail to recognize certain water zones, leading to lower accuracy in the classification process.
Figure 6 presents classification performance measures with and without uncertain graph embedding, including F1 measure, accuracy, specificity, and precision. The findings show that including uncertain graph embedding improves classification quality significantly for both SVM and RF classifiers compared to the approach that considers only precise water data. RF surpasses SVM in all measures, with and without uncertainty consideration, particularly in accuracy and F1. This leads us to conclude that utilizing uncertain graph embeddings can effectively improve the accuracy of water zones’ classification. Additionally, we can deduce that RF performs better than SVM in the embedding classification task. We also observed that adjusting the confidence threshold can help in identifying lowquality areas, which can be undetected due to the dynamics of the water environment. Finally, we emphasize that selecting the effective classifier is a critical factor that impacts the classification performance, and this decision should be made based on the desired confidence level.
4.2.2. Water Zones Embedding Visualization
In these experiments, we varied the confidence threshold and analyzed its impact on the uncertain water graph embedding process. The results are recorded in
Figure 7 and
Figure 8.
Figure 7 shows that several water zones cannot be identified with low confidence. This implied that lowconfidence zones had been neglected during the embedding process. For instance, with confidence of less than 0.6, water zones with very low quality have been excluded from the water zone classification process. These outcomes clearly reflect the importance of the confidence threshold and the water data uncertainty handling during the data analysis and embedding process. Water zones with low confidence should not be disregarded, but rather treated appropriately to ensure the accuracy and quality of the decision process. In addition, these results can be used to optimize water zone classification and improve the selection of water management policies.
Figure 8 also demonstrates that impoverished quality water zones were detected with a high confidence of 0.8. Thus, it can be concluded that embedding the uncertain graph enhances the classification of water zones by revealing the water zones with high uncertainty. This feature is crucial in highly dynamic and smart environments. By varying the confidence threshold, the water zone classification process can significantly improve the accuracy of decisions produced by the water management system. Thus, it is essential to determine the appropriate confidence threshold that aligns with environmental policies and requirements to obtain the best results for water zones’ classification and monitoring in smart environments.
Summarizing the above results, it was proven that handling the uncertainty in the water information network had positively impacted the recommendation of the appropriate water management actions. The embeddingdriven classification of water zones depending on their current state helped arrange water zones according to their quality level. This arrangement was considerably improved with the incorporation of uncertainty factors. For instance, lowconfidence water zones (i.e., high uncertainty) were excluded from the management process to avoid inappropriate recommendations. In this way, the decision on the water zones’ quality (excellent, good, poor, very poor) is based on a strictly refined set of classes. Contrariwise, higher confidence scores have increased the likelihood of accurately classifying a water zone into one of the considered quality levels. That is understandable because the high confidence score transformed the water information network into a deterministic one, thus correctly treating this content in its vectorized form.
In this study, we proposed an approach for decision making in IoTbased water environments through probabilistic and evidence theory based knowledge graph embedding. However, several limitations need to be addressed. These limitations include the following:
Handling different types of uncertainty: The use of other techniques for modeling uncertainty, such as fuzzy logic systems and possibilitybased theory, can help handle uncertainty in water environments, which is crucial for making accurate and reliable decisions.
Improving network representation learning: While knowledge graph embedding is a powerful technique, there are other network representation learning techniques, such as graph convolutional networks and attentionbased models, that can potentially provide more accurate and informative embeddings of water entities.
Distributed learning: The application of the distributed learning concept to water networks can enable collaborative, scalable, and privacypreserved analytics of water data in largerscale and more complex smart water networks, leading to better decision making and resource management.
5. Conclusions
This work focuses on managing smart water environments by proposing an uncertaintyaware decision support system that uses data collected by a network of sensors. The system leverages probabilistic techniques and network representation learning to create a probabilistic embedding of the water information network entities. The uncertain representations are classified using network representation learning, and evidence theory was applied to make decisions aware of the sensed water data uncertainties. The proposed system triggers appropriate water management policies, considering the incompleteness and imprecision of the sensed water data. The experimental results have proven the effectiveness of our approach in handling uncertainty in the vectorized water network.
As future research directions, we intend to use advanced probabilistic models to handle uncertainty in the water information network, such as fuzzy logic systems and possibility theory. We also will investigate the use of other network representation learning techniques (e.g., graph convolutional networks and attentionbased models) to learn more accurate and informative embeddings of water entities. Additional management capabilities will also be incorporated into the proposed decision support system to handle other waterrelated problems (e.g., water resource allocation, water pollution detection, and groundwater depletion). Finally, a federated learning approach is underway to ensure collaborative, scalable, and privacypreserving water data analytics in larger scale and more complex smart water networks.