A Novel Method for Anomaly Detection and Signal Calibration in Water Quality Monitoring of an Urban Water Supply System

.


Introduction
The provision of drinking water of an acceptable quality that meets public health targets requires the effective implementation of a multi-barrier approach against pollutants from source to tap [1].Water treatment plants are generally configured to treat contaminants from their raw water sources [2] and must have the capacity to dynamically respond to fluctuations in contaminants of public health concern [3,4].This has become more important as climate change combined with anthropogenic activities is affecting the type, occurrence, and concentration of contaminants in drinking water sources.Therefore, there is a need for consistent water source monitoring to ensure optimal operations of water treatment plants to safeguard public health [5].Traditionally, water source monitoring has largely been a labour-intensive endeavour, involving manual sampling, laboratory analysis, and in situ measurements of water quality parameters.In this regard, the selection of the number of samples, frequency of sampling, and water quality parameters to account for is often determined by national regulatory requirements coupled with source water catchment, water treatment plant capacity, and national risk-based targets.In water quality monitoring, parameters accounted for are generally categorized into physical, chemical, microbial, and radiological.
Physical and chemical parameters reflect the intrinsic physical and chemical attributes of the water and include pH, temperature, electrical conductivity, dissolved oxygen, colour, turbidity, organic compounds such as polyfluoroalkyl substances and polycyclic aromatic compounds, and inorganic compounds such as the different species of heavy metals.Microbial parameters can be broadly classified into viruses, bacteria, parasites, and protozoa.Radiological parameters characterize the ionizing radiation properties of the water.While there are sensors for in situ and continuous monitoring of most of the physical parameters, accounting for the concentrations of organic and inorganic chemical compounds, as well as microbial organisms, requires routine sampling of water and transporting the samples to a laboratory for analysis, leading to drawbacks of low sampling frequency, potential delays due to lengthy laboratory processes, and the potential oversight of significant anomalies [6].There are currently no sensors for in situ measurements of inorganic and organic compounds in water.While there have been attempts to develop sensors for the continuous monitoring of microbial organisms in water sources, such as by observing the behaviour of rainbow trout [7], or detecting particles with UV-vis spectrophotometry [8], questions remain regarding their accuracy, reliability, and scalability.
Unlike microbial organisms, there are possibilities for the continuous monitoring of some physical and chemical parameters, which offers the advantage of being labourfree, and capable of real-time monitoring, thereby facilitating enhanced efficiency with minimal or reduced delays [9].Recent research findings, as evidenced by [10], suggest a correlation between physico-chemical water quality and microbial indicator organisms, suggesting that the monitoring of physico-chemical parameters could be a viable alternative for determining microbial water quality.For instance, physico-chemical parameters such as turbidity, pH, electrical conductivity, and temperature are shown to have positive relationships with microbial indicator organisms [10] and heavy metals [11].Furthermore, there now exist stable and accurate sensor technologies for monitoring physico-chemical parameters, ranging from portable devices [12] to large sensor stations [13], which can be tailored to specific water quality data collection requirements.This opens up possibilities for efficient data collection without the need for labour-intensive processes.Moreover, advances in low-cost sensor technologies have made monitoring such parameters more feasible.The cost and technological requirements of low-cost sensors have significantly reduced while maintaining stability and usability compared to wired sensors, enabling broader coverage and higher sampling frequency [14].Leveraging these benefits for the continuous collection of physical and chemical water quality data can provide a more costeffective water quality management.However, since different organizations can choose their own parameters, no universal water quality standard can be established.This makes a a system which allows customer input parameters valuable.
In Norway, water utilities are leveraging sensor technologies, IoT, and big data analytics to improve their service delivery.In the Ålesund Municipality of Norway, a Smart Water Project was initiated in 2019 for the digital transformation of the water supply system [15].The project, among others, established a platform for the continuous monitoring of physical and chemical parameters in the drinking water source.The aim of the station is to provide relevant water quality data for the development of decision support systems for water treatment optimization.The platform couples sensors with IoT devices for the real-time measurement, transmission, storage and visualization of selected physical-chemical water quality parameters.Although such real-time water quality monitoring platforms have the ability to reliably monitor physico-chemical parameters, the data collected often have problems with errors and missing values, often due to drifts in the sensors following calibration or malfunctioning of sensors.In some cases, the collected data usually require extensive processing to make them useful for water quality management decisions.Moreover, due to the remote locations in which sensor platforms are typically placed in large water bodies, it can be impractical to calibrate the sensors in laboratories before sensor drift occurs [16,17].Thus, anomaly detection and calibration are vital for the post-processing of the water quality.
This research identifies three primary contributions in the development of a water quality monitoring system.
First, as previously mentioned, the quality of the data gathered is often prone to some errors due to sensor drifts and spikes, and thus, we applied anomaly detection to enhance data quality.Both supervised and unsupervised machine learning algorithms have been proposed to ameliorate the quality of sensor data.Munir, Siddiqui, Dengel, and Ahmed proposed a deep-learning-based unsupervised machine learning algorithm for anomaly detection, achieving an outstanding F1 score of 0.87, surpassing other algorithms at that time [18].In addition, graph-based algorithms have emerged as another promising solution, leveraging advancements in convolutional neural networks (CNNs).For example, Zhang et al. designed a multi-scale convolutional recurrent encoder-decoder framework that processes multivariate time-series data using a CNN and a recurrent neural network (RNN), surpassing traditional algorithms such as long short-term memory and support vector machine (SVM) [19].On the other hand, supervised machine learning algorithms offer enhanced accuracy by leveraging labelled data.Through the analysis of a substantial amount of labelled data, the dataset can be thoroughly explored, enabling precise classification of out-of-sample data.For instance, Muriira, Zhao, and Min [20] employed kernelized linear support vector machine to establish spatial links among sensor data and identify anomalies.However, the increasing number of data parameters poses a challenge for most SVM-based anomaly detection algorithms as the dimensionality becomes higher.To mitigate this issue, Borghesi et al. [21] utilized AutoEncoder to extract normal patterns and reduce the feature space, while Canizo, Triguero, Conde, and Onieva [22] applied a one-dimensional CNN to extract features from individual sensors and classify them with an RNN.Their studies achieved high accuracy in industrial scenarios.In this research, the raw data initially lacked any labels.However, considering the effectiveness and accuracy of supervised machine learning, an interview was conducted with experts in the domain to obtain their assistance in labelling the data.
Second, as in-laboratory sensor data calibration is not only time consuming but also subjective, as there is no standard currently, the alternative, data-driven calibration, is considered.Numerous supervised machine learning algorithms have been studied in previous research.In a study by Guo et al. [23], the performance of an artificial neural network (ANN), random forest, and SVM regression were applied to a dataset collected from a small urban lake in northern China, with ANN showing the highest performance.However, Bao et al. [24] demonstrated that random forest also performed well on a different dataset, while Tenjo et al. [25] obtained better results with SVM than ANN.In addition to these classic algorithms, CNNs have also shown significant promise in this field.Maier, Keller, and Hinz [26] developed a highly accurate method for estimating chlorophyll concentration using a one-dimensional CNN, which was proved to be applicable to realworld scenarios.Furthermore, researchers have explored combining different algorithms to improve performance.For example, Arnault et al. [27] combined an ANN with hierarchical agglomerative clustering, while Wang et al. [28] used a genetic-algorithm-based SVM approach.However, it is common among these studies that only the temporal dynamics are considered while the synchrony among entities is neglected.To fully utilize the dataset, we employ a self-organizing map (SOM) to produce criteria for forecasting water quality based on the physical and chemical features of the water source.
Third, to further enhance data analysis and interpretation, the collected data are clustered into different levels using unsupervised machine learning algorithms.Given the high dimensionality of the data collected from wireless sensor networks (WSNs), feature extraction plays a pivotal role in various applications in this field.Researchers have leveraged different techniques for this purpose, such as SOM, a neural-network-based clustering algorithm which has been used for extracting lower-dimensional tensors to enable data visualization and pattern analysis [29,30].Inspired by the findings from these studies, we decided to conduct our own experiments to investigate the synchrony among different indicators, and thus, build a standard for the current Smart Water Project.

Methodology
Figure 1 provides the framework for the proposed data-driven method, outlining its distinct components, each represented by a unique colour.The data source, marked in blue, is the collection of raw data collected.The data processing and analysis component, which is marked green, first checks whether the data requires calibration.If no calibration is needed, the sensor data are stored as high-quality data and used for water quality clustering.However, if an anomaly is detected, the data are fed into the signal calibration component, which produces calibrated data that are then stored as high-quality data and input into the pre-trained water quality cluster model to generate a timely water quality monitoring report.The final component, marked in orange, represents the output of the entire system.Once the data have been processed, the system generates both high-quality data and water quality clusters.The method assesses incoming data from the water quality monitoring platform to ascertain if there is a need for calibration.High-quality data, once identified as not requiring calibration, are directed into the data repository.In the event that anomalies are detected, the data undergo a signal calibration procedure, before being deposited in the data repository.The accumulated high-quality data significantly contribute to the water quality clustering model, thereby expediting the generation of timely water quality assessments.The concluding module, delineated by the colour orange, encapsulates the ultimate outcomes, namely, the aggregation of high-quality data and the delineation of water quality groupings.The former represents a crucial resource for subsequent research endeavours and informs decision-making processes within the realm of water quality monitoring.Concurrently, the water quality clusters furnish a comprehensive understanding of water quality categorization.

Data Source
This study leverages water quality data from a Vertical Profiler System, an anchored sensor platform made by the Water and Environmental Engineering Group at NTNU in Ålesund.The sensor platform is located at 62.48 • N and 6.47 • E in Brusdalsvatnet Lake in Ålesund, Norway, as depicted in Figure 2. The platform has an on-board winch system with a multi-parameter sensor (EXO3) that autonomously measures water quality parameters at different depths of its profiling location.The parameters accounted for are temperature, conductivity, salinity, turbidity, pH, optical dissolved oxygen (ODO), and fluorescent dissolved organic matter (fDOM), as depicted in Figure 3.However, in this study, only pH and turbidity are being studied due to the lack of clean data for other parameters.Data on the parameters are relayed to a local server via a 900 MHz radio link.The stored data are structured as interlinked time series.For the purposes of this study, the dataset used was from 9 June 2020 to 19 August 2022.This period covers the first phase of the platform installation, where significant anomalies in the dataset from the platform were recorded.Table 1 provides a descriptive overview of the raw data collected during the period.The count column indicates the number of observations for each parameter, while the min and max columns indicate the minimum and maximum values for each parameter.The mean column presents the average value for each parameter, and the missing data column shows the number of missing values for each parameter.However, the total amount of data is less than anticipated due to the platform's inoperability during the lake's freezing periods (usually from December to March) or the sensor platform's maintenance.By examining the mean values, it is apparent that not all the data are reliable.For instance, the average pH should not be 2.60, which indicates a strong acid.Moreover, considering both the minimum and maximum values, it is apparent that the raw data contain outliers in all indicators except timestamps and depths.To investigate the events that occurred on the platform, Figure 4 illustrates the temporal dynamics of each sensor with a depth of 1 m.By comparing the differences between the events and the recordings, we can assume the time and provide explanations for the events.For instance, the decline in measurements on 22 November 2020 and 11 April 2021, as well as 28 May 2021 and 23 March 2022, was due to winter maintenance and broken sensors, respectively.Even minor changes, such as the sensor calibration on 26 September 2020, can impact the data's usability and necessitate data cleaning procedures.Further analysis is conducted through the data distribution, as depicted in Figure 5.It can be observed that the other sensor data do not follow normal distributions, for example, pH has three peaks while both turbidity NTU and turbidity FNU have several local maximum points as well.According to the experts, this deviation is not always incorrect.With their assistance, the sensor error label was added.The objective of the calibration was to make the sensor data as close to the clean data as possible in terms of distribution, mean value, and standard deviation.The detection and correction of missing data are important in any data analysis project.However, in this project, only spiky data and drift data were considered for calibration.This is because the amount of missing data in the test dataset was small and the low sampling frequency of the sensor data, which was collected every 12 h, made missing data less of an issue.Therefore, the focus was primarily on correcting spiky and drift data, which have a greater impact on the accuracy and reliability of the data analysis.Four labels were attached to the test dataset, where group 0 represents no error, group 1 represents spiky data, group 2 represents drifted data, and group 3 represents both types of error.

Water Quality Anomaly Detection
The initial step after collecting raw data in this project is anomaly detection, which aims to classify the signal based on the presence of outliers and the type of fault.Figure 6 depicts the workflow.The data from various sensors are combined using a recurrence map based on location.Subsequently, a CNN is employed to classify the data into different categories based on the presence of anomalies.The effectiveness of this anomaly detection component is evaluated using training loss and accuracy and a confusion matrix.A recurrence map, as its name suggests, is a visualization tool used to analyse the repetition of data or patterns in given sequences.It works by comparing the distance between two states of a system, and expressing the difference using the following equation [31]: In this equation, R represents the distance, while x(t) and x(τ) denote the points in the sequence.ϵ is a predefined standard number used for measuring distance.If the distance between the two states is not greater than ϵ, R is calculated as the number of ϵ differences.Otherwise, it is calculated as the maximum difference, which is defined as N. Once a signal is provided, the Euclidean distances between every pair of statuses is calculated and form the recurrency matrix, with dimensions equal to a predefined window length.With such a matrix, an image-like form of the data can be generated for further analysis.
The architecture of the CNN network is illustrated in Figure 7 and the input data are from the 2D recurrence matrix with a dimension equal to a predefined window length.With reference to the labels, the network will adjust the weights and biases in every layer according to the accuracy.By leveraging 2 convolutional and 2 pooling layers, the input matrix is transformed into feature maps.A dropout layer is then applied to avoid overfitting and flattened into a feature vector.Lastly, a fully connected layer is applied to extract the classification of the sensor data.It is a multilayer perceptron neural network.The extracted features are calculated with where x and y are the input features and output classifications, respectively, and W FC is the linear weights [32].It is noted that the Softmax activation function is used for the last layer of the CNN classifier since it is a multiclass classification problem.Softmax is an activation function that is used to normalize a vector such that the sum of all the elements in the resulting vector is equal to one.It is mathematically represented in Equation (3) [33].Here, z j represents the j th element in a vector, and C represents the dimension of the input vector.By using this equation, the probability distribution among different classes can be calculated, and the trainable parameters can be adjusted to minimize the loss function, which is defined as categorical cross-entropy.It is calculated using Equation 4 [34], where y and ŷ are the true one-hot encoded vector and prediction probabilities for each class of the i th instance.

Correlated Time Series Calibration with AutoEncoder
After detecting anomalies in the sensor data, the signal calibration component is employed to correct the errors.As the data are potentially to be a correlated time series, both time dynamics and correlations can be considered in order to properly calibrate the data, and thus, an AutoEncoder-based network is proposed.
AutoEncoder is a type of neural network trained to generate an output identical to the input.Although the output x ′ j and input x j are not exactly the same, they remain consistent with each other in probabilistic terms such as mean and standard deviation [35].Thus, it is commonly utilized for dimensionality reduction and image denoising.The AutoEncoder model, as shown in Figure 8, can be divided into two parts: the encoder, which receives input data and processes it to generate a code; and the decoder, which regenerates the data from the code to make it as similar to the input data as possible.This process enables the AutoEncoder to extract important features from the input data and remove noise.At the beginning of this model, data are encoded into a 2D tensor, where both temporal and correlation information will be considered.The tensor is of dimensions 10 × 20 × 1, where 10 indicates the number of indicators considered, 20 is the length of the time series window, and 1 is the channel number.Specifically, each tensor is comprised of 20 samples containing all the measured parameters.In an AutoEncoder network, the trainable parameters can be represented by θ = {W, W ′ , b, b ′ } Where W = {w j , j = 1, 2, . . ., n} and b = {b j , j = 1, 2, . . ., n} are the weight matrix and bias vector for the encoder layer, while W ′ = {w ′ j , j = 1, 2, . . ., n} and b ′ = {b ′ j , j = 1, 2, . . ., n} are the weight matrix and bias vector for the decoder layer [36].For both encode and decode operations, the rectified linear unit (ReLU) activation function, as shown in Equation ( 5), was selected [37].
Thus, the j th element h j of compressed code can be expressed as [36] where * is the convolution layer operations and s is the activation function.
With the calculation result, the regenerated data can be obtained through [36] x where H is the collection of feature maps.
Unlike the traditional AutoEncoder model, which measures the difference between input and output data, the error in this project is considered as the difference between uncorrupted data and regenerated data while estimating the effectiveness of the proposed model.In this project, the sigmoid function, as shown in Equation ( 8) [38], was chosen to measure the cross-entropy error, and thus, minimize it during the training process.

Water Quality Clustering with Self-Organizing Map (SOM)
SOM clustering is an unsupervised machine learning algorithm.It was first proposed by Teuvo Kohonen in 1982 in [39].Basically, it receives n-dimensional input vectors and feeds them into a neuron network to generate a two-dimensional map that can retain the original information in the input dataset.This map preserves the structural information of the data points in the dataset which, in other words, provides similar interconnecting weights to the neighbouring points.Moreover, the map itself also contains information about centroids.Every point on the map is related to the interconnecting weights and the points' values represent the centroids.The size of the map indicates the assumed number of clusters.In this project, an SOM-based model was proposed to cluster the water quality data.The topology is depicted in Figure 9.
Output map

High quality data
Data encoding The present model utilizes input data derived from the high-quality data obtained in the previous section.To comply with the requirements of the sensor platform, the data are structured as a 3D tensor with the dimensions of location, sensors, and timestamp.The first step involves encoding this tensor into a 2D image, which organizes the data collected at the same time.One dimension of the image represents the depth while the other dimension comprises different sensor data.This transformation changes the problem into an image clustering task while preserving the spatial correlation among the data.Moreover, by analysing the data collected at various timestamps, the temporal changes in water quality can be investigated in detail.Then, the images are processed through the SOM network, which analyses the input data at the pixel level.The training process for SOM is outlined in Algorithm 1.

Algorithm 1 Self-Organizing Map Algorithm
Require: Set of InputData points X = x 1 , x 2 , . . ., x n , stopping criterion Ensure: Set of ClusterLabels label 1 , label 2 , . . ., label k Initialize the weights of the nodes randomly Repeat Select a data point randomly from the input dataset distance ← the Euclidean distance between the selected point to all other points BMU ← the point with closest weight vector Update the weights of BMU and its neighbouring nodes Until stopping criterion is met The initialization of weight values is performed randomly for each input sample, and then, the weight at the best matching unit and its neighbouring weights are updated with a learning rate.The best matching unit is defined as where the distance between a sample and other weighted vectors is minimized, and thus, can be computed as shown in [39]: where x is the sample vector and m c and m i are the best matching unit and i th weighted vector, respectively.The neighbouring weighted vectors at t + 1 can be then calculated based on the value at time t as [40] m Here, η(t) is the learning rate which decays with time, following and h ci denotes the neighbouring kernel and is expressed as where Here, d ci denotes the distance from the best-matching unit to the i th neighbouring data point rate while σ(t) is the radius of the neighbouring area, which follows an exponentially decaying function.
The updating finishes when the convergence requirement or iteration number is reached.Normally, the convergence is the dissimilarity, which can be calculated as the distance between data points.This project achieves this by using the Euclidean distance, where the weighted vectors are used to approximate the centroid.

Evaluation Matrix
In this project, the key parameter that needs to be defined is the number of clusters, denoted by K.The selection of K is performed by comparing the distortion and silhouette score, which are used to evaluate the tightness cohesion and separation, respectively.
Distortion, also referred to as intra-cluster distance, is a widely employed measure for assessing the performance of clustering algorithms.It quantifies the average squared Euclidean distance between each data point and its corresponding cluster centroid.By evaluating distortion, researchers can determine the quality of clustering results and make informed decisions regarding the optimal number of clusters.
For the silhouette score, cohesion refers to the distance from a data point to its cluster's centroid, while separation refers to the distance from this point to other clusters' centroids.Specifically, the silhouette score measures the ratio of cohesion to separation, which is calculated using the following equation [41]: Here, b i is the mean distance from the i th data point to all other data points in the same cluster, and a i is the average intra-cluster distance from the i th data point to all other clusters' centroids.
The silhouette score has a range from −1 to 1, where a higher index indicates higher inter-cluster similarity and lower intra-cluster similarity.Specifically, 1, 0, and −1 denote the best, indifferent, and wrong clustering results, respectively.
However, both matrices have their own limitations.The distortion does not take the intra-cluster similarity into consideration and the silhouette score is sensitive to noise and cannot handle overlapping clusters.Therefore, the combined results of both metrics are used to determine the final number of clusters.

Water Quality Sensor Data Anomaly Detection
The labelled data were transformed into recurrence maps, as depicted in Figure 1.To mitigate the influence of varying scales across different sensor data, the data were normalized using Equation ( 15): where X represents the data series, and x and x ′ denote the original and normalized data.Here, X.mean and X.std represent the mean value and standard deviation of the series, respectively.Two different mapping strategies were employed in this study.The first strategy considered the temporal dynamics of each sensor by retaining the original data values and calculating the recurrence matrix, as described in Section 2.2.The second strategy involved the normalization of data from multiple sensors recorded at the same time.
After the matrices are generated, they are fed into the CNN classifier for testing.Not only the loss line is depicted but also the accuracy is presented to evaluate the model's performance.The results can be seen in Figure 10.Although for both strategies, the overall accuracy increases with the training epoch and finally reaches a relatively stable stage, the accuracy from the normalized multisensor is higher than the unnormalized sole sensor.The former can reach as high as 99.6% while the latter can only reach 94.3%.To provide a more detailed comparison, a confusion matrix was utilized.Table 2 shows the confusion matrix, which is calculated from the detection class for every class.The F1 score, the harmonic mean of precision and recall, was applied to analyse this matrix; it can be expressed as follows: (16) where precision(class = a) is defined as the ratio of true positive (TP) results to the sum of true positive and false positive (FP) results for class a, which measures the model's ability to identify positive results, and recall(class = a) is defined as the ratio of TP results to the sum of TP and false negative (FN) results for class a, which measures the ability to capture all positive examples.It should be noted that these are based on a single class in a multiclass classification problem.In this case, the F1 score was calculated for each class separately as 0.96, 0.71, 0.97, and 0.96 for "clean", "drift", "spiky", and "both" classes, respectively.A comparison between raw data and calibrated data at a depth of 1 m is depicted in Figure 11.In all three figures, red lines represent the original data, while blue lines represent the calibrated data.From Figure 11a, we can see that the raw pH sensor data suffered from both drift and spikiness.After 60 samples, the mean value decreases, and the variation starts increasing.Compared to these raw data, the calibrated data are more usable, showing less drift and fewer spikes.Figure 11b and Figure 11c show positive results for turbidity NTU and turbidity FNU, where drift is the primary challenge.The drift is reduced compared to the uncalibrated data, and the distribution is closer to the 60 clean data points.To further analyse the data quality, a study of the calibrated data distribution is conducted, and the results are shown in Table 3 and Figure 12.Since no dramatic changes occurred during the data collection period, the sensor data should follow the same distribution.Thus, the mean value, standard deviation, and density distribution are examined.From Table 3, it can be observed that the regenerated pH data have the closest mean value and standard deviation, with offsets of 1.6% and 17.6%, respectively.The calibrated turbidity NTU and FNU also show smaller deviations from the clean data compared to the uncalibrated data.Although drift still exists, the mean value drops from 1.145 to 0.101, with a baseline of 0.126.This can be considered effective for data drift calibration.By examining the distribution in Figure 12, a similar conclusion can be drawn.It is common for the three parameters that the shapes formed by calibrated data are closer to the uncalibrated data.It should be noted that the presented findings are consistent with the performance of the algorithms for other depths, given the same sensor platform, location, and data structure.

Water Quality Clustering
With these calibrated data, clustering algorithms can be applied to generate the final clustering for water quality monitoring.In addition to the originally clean data, the calibrated data are also taken into consideration.The combination is then fed into the clustering algorithm to evaluate the final results for our project.
Figure 13 illustrates the outcomes obtained from applying the SOM clustering algorithm.These heatmaps enable a comparison of the silhouette scores based on different SOM grid shapes, aiding in the determination of optimal parameters.Shapes (2, 1), (3, 1), and (2, 2) exhibit identical silhouette scores of 0.73.Among these, both (2, 1) and (3, 1) indicate a tendency for samples to be clustered into two groups.However, this preference for fewer clusters stems from the silhouette score's inclination towards selecting configurations with minimal dissimilarity within clusters and maximal dissimilarity between clusters, rather than solely considering the number of clusters.Consequently, it is crucial to consider alternative shapes that yield high scores.To this end, shape (2, 2) clusters the data into three distinct groups.The clustering distribution with AutoEncoder-calibrated data is visualized in Figure 14.In this case, cluster 0 has the greatest portion of data points (49.2%) while 23.0% of the data are assigned to cluster 3.This indicates a balanced distribution among the clusters, which is required by the clustering algorithms.
During discussions on the clustering results, the experts emphasized the significance of determining the appropriate number of clusters.In practice, it is uncommon to have more than five clusters, especially in the context of drinking water reservoirs, where careful selection and minimal drastic changes are expected.However, merely having two clusters does not provide accurate enough results to represent the final outcome.Consequently, the experts recommended that three, four, or five clusters would be the most suitable choices.Subsequently, they examined the clustering results and confirmed that the clustering achieved using the SOM with AutoEncoder-calibrated data was reasonable and could be further explored through in-depth analysis.

Potential Limitations
The approach outlined in this project faces several limitations, specifically pertaining to environmental factors, workforce issues, and standardization challenges.
One of the primary limitations is related to the environmental conditions in the experiments are conducted.The experimental platform is situated in a lake in Ålesund, and during the winter season, the platform freezes, rendering it incapable of collecting data.Consequently, this poses a significant obstacle in obtaining accurate and reliable data during those periods.Moreover, the surrounding environment further complicates data collection efforts.The reservoir is not located in an isolated area, as there is nearby transportation infrastructure that may introduce unwanted influences on the water quality sensors.
The second limitation revolves around the workforce involved in the project, particularly in terms of data labelling.Labelling the data is crucial not only for anomaly detection but also for enhancing calibration and clustering outcomes.Allocating additional resources to ensure the collection of accurate sensor data and establishing standardized clustering procedures would greatly improve the overall accuracy of the solution.However, acquiring access to water quality data, which is confidential, proves to be challenging, hindering the expansion of the dataset.Consequently, human labelling remains the most viable option, despite requiring a substantial labour force.
The final limitation concerns the usability of the overall solution and the need for standardization.Establishing a comprehensive set of standards is essential to enhance the system's utility and comprehensibility.These standards would encompass indicator selection, data collection, and processing procedures, as well as the final clustering methodology.However, each city follows its own unique set of standards, and different systems are employed, further complicating the task of standardization.Overcoming this challenge presents a difficult undertaking due to the inherent variations across different locations and systems.

Conclusions
Currently, water source monitoring in water supply systems relies on regular sampling and analysis for microbial water quality indicators and other pollutants such as heavy metals to ensure the provision of safe drinking water to the public.However, these methods require long periods of in-laboratory processing, they are costly, and make real-time monitoring impractical.To address this limitation, a sensor platform has been implemented to collect data from Brusdalsvatnet Lake, the drinking water source for the municipality of Alesund in Norway.Yet, measurements from the sensor platform often contain anomalies, making it difficult to directly apply the data for effective management decisions.This study proposes an architecture that can enable a more rapid and efficient preprocessing and analysis of the data transmitted by the sensor platform.By implementing this pro-posed architecture, the water supply system can promptly identify anomalies, calibrate the data, and conduct clustering, ensuring effective management of water resources which contributes to advancements in water quality monitoring, providing valuable insights for stakeholders and decision-makers involved.
The significance of the proposed system in the water quality monitoring for the municipality was underscored by the stakeholders.They emphasized the system's seamless data analysis capabilities, which ultimately yield substantial benefits for them by providing valuable insights into potential fluctuations in water quality.To further illustrate the potential applications of the collected data, one example is the utilization of fluid dynamics simulations.These simulations can be performed using the collected data, allowing for the prediction and assessment of water quality dynamics [42].Moreover, the recorded data can be employed for water risk management purposes.The findings and observations derived from the monitoring system can serve as essential inputs for assessing and mitigating potential risks associated with water quality in urban water supply systems [43].
To make the whole system more feasible, the future work of this project involves expanding its scope, collecting more data, collaborating with domain experts to validate and interpret the results, and integrating the system with other relevant systems.By pursuing these avenues, we can improve the reliability, usability, and applicability of our models, enabling their effective implementation in real-life scenarios and benefiting various stakeholders involved in water quality management and decision-making processes.

Figure 1 .
Figure 1.System overview of the whole project.

Figure 2 .Figure 3 .
Figure 2. Location of the water reservoir being studied.

Figure 4 .
Figure 4. Time series visualization of sensor data.

Figure 5 .
Figure 5. Distribution of flattened sensor data.

Figure 10 .
Figure 10.The training loss and accuracy change with epoch.

Figure 11 .
Comparison of calibration results to original data.

Figure 12 .
Figure 12.Density distribution comparison of clean data, uncalibrated data, and AutoEncodercalibrated data.

Table 1 .
Description of the raw dataset. )

Table 2 .
Confusion matrix for anomaly classification.

Table 3 .
Distribution comparison among clean data and calibrated data.