Sensor Fusion Enhances Anomaly Detection in a Flood Forecasting System

Ma, Andrew; Karande, Abhir; Dahlquist, Natalie; Ferrero, Fabien; Nguyen, N. Rich

doi:10.3390/jsan14020034

Open AccessArticle

Sensor Fusion Enhances Anomaly Detection in a Flood Forecasting System

by

Andrew Ma

^1,*

,

Abhir Karande

¹,

Natalie Dahlquist

¹,

Fabien Ferrero

² and

N. Rich Nguyen

¹

Department of Computer Science, University of Virginia, Charlottesville, VA 22904, USA

²

Laboratoire d’Electronique, Antennes et Télécommunications (LEAT), Université Côte d’Azur, CNRS (UMR 7248), 06903 Sophia Antipolis, France

^*

Author to whom correspondence should be addressed.

J. Sens. Actuator Netw. 2025, 14(2), 34; https://doi.org/10.3390/jsan14020034

Submission received: 6 February 2025 / Revised: 11 March 2025 / Accepted: 21 March 2025 / Published: 25 March 2025

(This article belongs to the Special Issue Fault Diagnosis in the Internet of Things Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

To build an Internet of Things (IoT) infrastructure that provides flood susceptibility forecasts for granular geographic levels, an extensive network of IoT weather sensors in local regions is crucial. However, these IoT devices may exhibit anomalistic behavior due to factors such as diminished signal strength, physical disturbance, low battery life, and more. To ensure that incorrect readings are identified and addressed appropriately, we devise a novel method for multi-stream sensor data verification and anomaly detection. Our method uses time-series anomaly detection to identify incorrect readings. We expand on the state-of-the-art by incorporating sensor fusion mechanisms between nearby devices to improve anomaly detection ability. Our system pairs nearby devices and fuses them by creating a new time series with the difference between the corresponding readings. This new time series is then input into a time-series anomaly detection model which identifies if any readings are anomalistic. By testing our system with nine different machine learning anomaly detection methods on synthetic data based on one year of real weather data, we find that our system outperforms the previous anomaly detection methods by improving F1-Score by 10.8%.

Keywords:

anomaly detection; smart city; sensor fusion; sensors; flood forecasting

1. Introduction

1.1. Flooding

Flooding is a prevalent natural catastrophe that poses a significant threat to people’s livelihoods and results in substantial economic losses in damages each year. Flooding is an especially pressing challenge in Vietnam, where about a third of the population today is exposed to flooding [1]. In fact, Vietnam ranks fourth worldwide in terms of population exposure to river floods and third in terms of estimated GDP at risk due to these floods [2]. Floods are so damaging that the long-term effects of a single major flood event can reverberate through the economy for years [3].

Urban flood prediction is the forecasting of floods in densely populated city areas. This is especially significant because cities are much more heavily inhabited, so floods in these areas can cause much more damage. There are two main aspects to addressing urban flooding. One part is to use hydrological flood prediction models to predict floods [4]. The other aspect is to deploy sensors throughout an area to monitor flood features, such as rainfall or water level [5]. We aim to tackle both approaches through Floodwatch, a flood intelligence system for flood-prone countries like Vietnam.

1.2. Floodwatch

Floodwatch (floodwatch.io) is an Internet of Things (IoT) infrastructure for flood intelligence and forecasting [6]. Vietnam was our initial pilot country (as depicted in Figure 1), but our locations of interest have expanded to regions in the United States and Germany. By building accurate flood forecasting models coupled with real-time sensor monitoring, we hope that the user-facing application of Floodwatch will indicate early warnings of floods, thus minimizing the potential damages.

The real-time sensor monitoring of Floodwatch is achieved using IoT devices in a long-range wide-area network (LoRaWAN). Numerous IoT devices deployed around the world are an integral part of the Floodwatch platform. These devices transmit readings such as temperature, precipitation, ullage, or other measurements related to floods and flood forecasting. Since several forecasting models and decision responses may depend on the readings from these sensors, it is integral that the data from these sensors are accurate and verified. Sensor reading anomalies can be caused by a variety of reasons, including physical disturbances, inaccurate positioning, poor signal strength, low battery life, or the old age of the sensors [7].

1.3. Anomaly Detection

Much research has been carried out to identify anomalies in data streams. The current state-of-the-art anomaly detection methods are generally deep learning algorithms, as they have the capability to learn from extremely large and complex data [8]. Several recent paradigms also involve the usage of unsupervised and self-supervised learning to deal with the lack of sufficiently labeled anomalous data.

There are three main sources of outliers (or anomalies) in IoT data: error and noise, malicious attacks, and events [9]. Error and noise are anomalies in which the data received are different from the true state of what is being measured. This could be a result of faulty equipment or malformed data. Malicious attacks are where an external attacker deliberately and intentionally compromises the devices. This could be both physically, by tampering with the sensors, or digitally, by hacking into the device or network. Events are outliers that represent an actual change in the environment that is not considered normal. An example of an event could be heavy rain that eventually leads to a flood. Event anomalies are quite different from errors/noise and malicious attacks, as they are a result of intended behavior, so events should be handled differently.

Sensor anomaly detection is an essential feature of our smart city infrastructure. Our goal is to build a system that can identify patterns or deviations in our sensor data that may suggest equipment failures or malicious attacks. This will ensure the integrity of the data.

However, it is often quite difficult to differentiate between faults or intrusions, which we want to flag, and events, which might be important but we do not want to flag. State-of-the-art anomaly detection models generally perform anomaly detection on only the data stream and do not compare it with the device’s environment. This setup results in a high likelihood that events will be classified as false positives [9].

In this paper, we propose that the geographical context of the sensor network can be utilized to produce more meaningful and accurate anomaly detection. Nearby devices will generally have similar weather readings at the same time. For a non-anomalous time series, the difference will generally be zero, but, if there is an anomaly, the difference will become non-zero.

We can take advantage of this relationship by performing sensor fusion on nearby sensors. We show that, by taking the difference in the readings of nearby devices, we can fuse the data in a way that improves the ability of various anomaly detection models. The benefit of this is not identifying events as false positives, as well as being able to identify more anomalies. We evaluate the system’s performance on a wide range of anomaly detection methods, demonstrating our system’s model-agnostic ability to succeed with any state-of-the-art anomaly detection method.

2. Related Works

2.1. Time-Series Anomaly Detection Methods

Prior work on time-series anomaly detection can be largely categorized into statistical, machine learning (ML)-, and deep learning-based models. Statistical models include ARIMA and Holt–Winters, and basic examples of ML algorithms include the One-Class Support Vector Machine and Isolation Forests [10,11].

The current state-of-the-art in time-series anomaly detection is deep learning methods, as they have the ability to learn from extremely large and complex data [8]. With their large number of parameters and large collection of training data, deep learning models can identify and understand intricate patterns much better than other methods, resulting in improved performance on several benchmark datasets.

Supervised deep learning methods are not necessarily effective due to the unlabeled (or weakly labeled) nature of many anomaly detection problems. Instead, there are two prominent strategies for deep learning-based anomaly detection: forecast-based or reconstruction-based [12]. In these strategies, the model is trained with only the input time-series values and does not require any labels regarding whether a value is anomalous or not.

Forecasting methods work by training a method to predict time series. If a time series is not anomalous, the model’s prediction will be close to the actual time series. However, if the prediction error is greater than a threshold, then that point is labeled as anomalous [13]. Long Short-Term Memory (LSTM) cells, which are commonly used for time-series prediction, are the most commonly used architecture for prediction-based methods [14,15]. Forecasting models are generally inferior to reconstruction methods as they struggle to predict over long time horizons, meaning that they are only effective for short periods of anomaly detection.

Reconstruction methods train an autoencoder neural network to reconstruct an input time series. Similar to forecasting methods, they will reconstruct non-anomalous time series well, but struggle to reconstruct anomalous time series [12]. The error between the reconstructed time series and the original time series is the reconstruction error. If the reconstruction error for a series of points is above a specified threshold, then it is likely that the series of points contains an anomaly. USAD and DAE are some basic examples that employ an autoencoder architecture for time-series anomaly detection [16,17].

Machine learning methods designed for sequential inputs, such as Recurrent Neural Networks (RNNs), LSTM, or Transformers, are especially strong candidates for reconstruction-based methods. Some notable approaches include RNN Encoder–Decoder, an RNN-based autoencoder, and EncDec-AD, an LSTM-based autoencoder [18,19]. More recent work has explored Transformer-based reconstruction methods, such as TranAD and AnomalyTransformer [20,21].

Separately, Graph Neural Networks (GNNs) have also been used to detect anomalies in multivariate time series [22]. These models are able to perform anomaly detection for multivariate data by explicitly modeling the inter-dependencies between variables. This is particularly useful in sensor networks, where relationships between time series can be complex and nonlinear. By learning a graph representation of these inter-variable relationships, GNNs enable more robust anomaly detection by identifying deviations from expected structural patterns rather than just individual time-series behaviors.

The methods defined in this section are all effective in identifying anomalies in a given time series. However, they are only standalone algorithms that work best in an isolated environment. In order to effectively detect anomalies in a real-world setting, a complete infrastructure must be built around these models.

2.2. Applications of Anomaly Detection

The methods in the section above are standalone methods that only identify anomalies in a given time series. However, in order to effectively detect anomalies in a real-world setting, a complete infrastructure must be built around these models. As such, much research has also been conducted on systems that apply these anomaly detection algorithms.

Anomaly detection methods are commonly applied to the realm of IoT networks and other wireless sensor networks [23,24,25]. This is because these networks are susceptible to distributed denial-of-service (DDoS), botnet attacks, or other cybersecurity issues. A successful attack on an IoT network can completely shut down the system, which may result in large unintended consequences for users.

Researchers in [25] describe the use of SVMs for fault (errors or noise) detection in wireless sensor networks. On top of the generic SVM for classification, they propose a two-phase system with a learning and a real-time phase. This structure emphasizes precomputation and minimizes the computation time for inference. This system can still be improved though: even if the system identifies a fault, it does not do anything to address it.

Intrusion detection systems (IDSs) are systems built to mitigate the effectiveness of DDoS or botnet attacks. An IDS must do more than just detect anomalies: it should process the input data, perform anomaly detection, and then take action on those anomalies. The researchers in [24] propose an IDS that classifies the incoming data, performs anomaly detection, and then blacklists any sources that send anomalous packets. This ensures that, once the attack is identified, it will be stopped. Another important aspect is that any anomaly detection model can be used for the anomaly detection step of the system; in other words, the system is model agnostic. This decoupling is important as it allows the model to be replaced with evolving state-of-the-art anomaly detection models as needed.

Researchers in [26] have developed an anomaly detection system specifically designed for wireless sensor networks (WSNs) by integrating edge-based and cloud-based computing. Their system distinguishes between short-term anomalies, which may result from transient sensor noise, and long-term anomalies, which indicate persistent faults or environmental changes. The edge-based method utilizes a Restricted Boltzmann Machine (RBM) to detect anomalies in real-time without the need for extensive historical data storage. The cloud-based method simultaneously utilizes a multi-parameterized edit distance (MPED) algorithm to detect deviations in sensor correlations. This combined approach enables anomaly detection that is both resource-efficient and scalable.

These systems present promising methods to perform anomaly detection for wireless sensors or IoT devices. However, anomaly detection for weather sensors is unique in a few ways. For one, extreme weather events (heat wave, extreme rain, dry air) is not uncommon, and it is important that these readings are not marked as false-positive anomalies. Additionally, weather sensors in the same region are likely to produce similar, or at least correlated, readings [9]. Anomaly detection for weather sensors can be further improved by utilizing these observations and fusing sensor readings.

2.3. Sensor Fusion

In the weather sensing environment where a network of numerous sensors is deployed, much of the data from nearby sensors can be redundant [27]. This redundancy can be utilized by corroborating the data from one sensor with another sensor, also known as sensor fusion, which can lead to increased sensor reliability. Sensor fusion is when sensor data are combined and utilized in a way that the resulting information is more useful than both sources individually [28].

Sensor fusion for IoT can be classified as one of three categories: Probability-based, Artificial Intelligence-based, and Theory of Evidence-based [29]. Probability-based methods, such as the Probabilistic Data Association, rely on statistical models to estimate the true value of a measured variable by accounting for uncertainties and noise in sensor readings. AI-based methods employ machine learning algorithms to learn patterns from sensor data and make predictions or classifications. Theory of evidence-based methods, like Dempster–Shafer theory, combine evidence from different sources to calculate the probability of an event.

The key difficulty in IoT anomaly detection methods that sensor fusion can help to address is that it is difficult to distinguish between anomalistic readings, where the environment is experiencing anomalistic behavior (events), and sensor anomalies, where the sensor is displaying anomalistic readings (errors, noise, attacks). By fusing device readings with other nearby devices, events can easily be distinguished, as all sensors will display anomalous readings during an event, but errors will only be present in a singular sensor.

Several researchers have explored sensor fusion in heterogeneous wireless sensor data streams. Researchers in [30] proposed an approach for monitoring wireless sensor networks by identifying hidden correlations between sensor readings. Their method utilizes sequence alignment techniques to detect anomalies, analyzing the relationships between heterogeneous data streams rather than treating sensor readings in isolation. Similarly, Ref. [31] introduced a multisensor data fusion (MDF) algorithm that exploits hidden correlations. Their approach first aligns sensor data using sequence alignment methods, and then it applies an MDF technique that adapts to the varying semantics of different applications. Both of these approaches are particularly effective in distinguishing genuine environmental anomalies from sensor faults, a challenge commonly encountered in IoT anomaly detection systems.

3. Materials and Methods

Through our literature review, we identified several aspects of time-series anomaly detection methods that have been proven to work very well. Leveraging the advancements in anomaly detection technologies, our objective is to develop a sophisticated sensor data verification system for Floodwatch that integrates the strengths of deep learning-based anomaly detection methods, a robust, model-agnostic framework, and corroborative sensor data fusion. This system aims to accurately identify any errors, noise, and malicious attacks in our sensor data streams, while minimizing the false positives of event-type outliers.

We developed our system on the premise that sensor readings from proximal devices at similar times should produce highly similar readings [9]. We proposed that fusing and comparing device readings with other nearby sensors would improve the performance of existing anomaly detection algorithms. There are two lines of reasoning for this.

In the case of an anomaly caused by error, noise, or a malicious attack, only a singular device’s data will have an outlier, and that device’s readings will be greatly different from any surrounding devices. Comparing the anomalous device’s readings with any other nearby device will make the outlier very obvious, as the data of nearby devices should normally be very similar.

There is also the other case of an event-type outlier, such as a heavy rainfall event. Since the goal of this system is data verification, this type of outlier should not be flagged. If an extreme weather time series is examined by itself, its likely that an anomaly detection algorithm will identify it as anomalous. However, when there is an extreme weather event, all of the devices in an area will transmit those extreme weather conditions. Therefore, if the device’s data are compared with other nearby devices, it is unlikely that the extreme weather event will be labeled as anomalous.

There are five main stages in our sensor anomaly detection system: device grouping, device pairing, time-series data fusion, anomaly detection, and weather verification. An overview of our methodology can be seen in Figure 2.

3.1. Device Grouping

The first step is to group our collection of sensors (

S_{a l l}

) by location. Floodwatch devices are currently deployed in multiple cities across Vietnam, Germany, and the United States. Sensors that are far away from each other do not need to be compared since they will not produce correlated readings. Therefore, we create groups of devices that we plan on fusing with each other based on proximity.

To effectively group sensors, we utilize a Density-Based Clustering of Applications with Noise (DBSCAN) algorithm. The DBSCAN algorithm relies on two supplied parameters: the maximum distance for two sensors to be considered neighbors (

ϵ

) and the minimum number of sensors required within the

ϵ

radius to be considered a dense region. The algorithm first identifies core points, which are those surrounded by a sufficient number of other points within the defined

ϵ

radius. It then constructs clusters by linking core points that are mutually reachable within each other’s

ϵ

radius [32].

DBSCAN is chosen for two reasons: we do not know the exact number of clusters of sensors, and the

ϵ

parameter very closely represents a measure of how close we want sensors to be to compare them. We can apply DBSCAN to the devices by creating a set of points

S_{l a t l o n g}

where each point is a device’s latitude and longitude:

\begin{matrix} S_{l a t l o n g} = {(s_{l a t}, s_{l o n g}) ∣ \forall s \in S_{a l l}} . \end{matrix}

(1)

The distance metric used by DBSCAN is the geodesic distance between the coordinates. Running DBSCAN on this set of points will result in k clusters:

\begin{matrix} C = DBSCAN (S_{l a t l o n g}) = {(s_{1 a}, s_{1 b} \dots), (s_{2 a}, s_{2 b} \dots), \dots, (s_{k a}, s_{k b} \dots)} . \end{matrix}

(2)

The resulting clusters C are k groups of sensors, where each group is geographically close enough to be fused with each other. The sensors within these groups will be paired in the next step.

3.2. Device Pairing

For each cluster of devices

C_{k}

, we will pair each device’s readings with one other device in the same cluster for fusion. We aim to minimize the distance between the paired devices, as readings will be more correlated if the two devices are closer. As such, the goal is to pair devices such that the sum of distances between pairs is minimized. This pairing can be produced by converting the network of devices into a complete graph and utilizing a minimum weight matching algorithm.

Let

G_{k} = (V, E)

be the graph of devices where

\begin{matrix} V & = {s ∣ \forall s \in C_{k}} \end{matrix}

(3)

\begin{matrix} E & = {(s_{i}, s_{j}, d_{g e o} (s_{i}, s_{j})) ∣ (s_{i}, s_{j}) \in C_{k} and i \neq j} . \end{matrix}

(4)

V is every device in the cluster, and E is an edge between every device that has weight

d_{g e o}

, which is the geodesic distance between devices

s_{i}

and

s_{j}

. This graph is a complete graph, as every vertex has an edge to every other vertex.

We can find an optimal pairing of devices by finding a minimum weight matching of a graph representation of the sensors, which is solved using Edmonds’ algorithm [33]. A matching M of a graph is the set

M \subseteq E

where no edge has common vertices [34]. In our context, each edge in a matching represents a sensor fusion pair between two devices, and the edge weight is the distance between the devices. To maximize the number of devices paired, we want to find a matching with maximum cardinality, or a maximal matching [35]. Additionally, we want to minimize the sum of the edges, thus minimizing the total distance between paired devices.

We generate pairs by running Edmonds’ algorithm on

G_{k}

to obtain a set of edges that we use to generate pairs of devices P for fusion.

\begin{matrix} P_{k} & = Edmonds (G_{k}) = {(s_{a}, s_{b}), (s_{c}, s_{d}) \dots} \end{matrix}

(5)

In cases where

| V |

is even, it is guaranteed that every vertex will have an adjacent edge in

Edmonds (G_{k})

, as

G_{k}

is a complete graph. In cases where

| V |

is odd, there will be one device left out. This device will simply be paired with its closest neighboring device. The result of this step is a set of pairings

P_{k}

for every cluster of sensors

C_{k}

. These pairings will be fused together in the next step.

An example of minimum weight matching resulting in an optimal pairing of sensors can be seen in Figure 3.

3.3. Sensor Fusion

After the sensors are paired, we fuse every pair of clusters in

P_{k}

. First, we identify a sequence of timestamps T. This is needed because IoT sensor readings are not always temporally aligned. Let T be a sequence of timestamps at a fixed

Δ T

apart, or

\begin{matrix} T = {T_{start}, T_{start} + Δ T, T_{start} + 2 Δ T, \dots, T_{end}} . \end{matrix}

(6)

Then, we fuse the sensor readings by taking the difference between corresponding readings at the specified timestamps. For each pair

(s_{a}, s_{b})

in

P_{k}

, we fuse the data as follows:

\begin{matrix} F = Fuse (s_{a}, s_{b}) & = {s_{a} (t) - s_{b} (t) ∣ t \in T} \end{matrix}

(7)

where

s (t) = interpolated reading of sensor s at time t

. While this is simpler than most sensor fusion methods, this differential fusion still corroborates the paired sensors’ data in a valuable way. The differences between these readings will generally be close to 0 for nearby sensors, and may even have specific patterns. This step effectively removes the weather information from the time series, and the resulting time series is just the difference in readings between sensors. An example of the proposed interpolation and difference can be seen in Figure 4.

The result of this step is a fused time-series dataset F for every pair of sensors in every cluster. In the next step, each of these time series will be run through an anomaly detection model.

3.4. Anomaly Detection

Next, we perform anomaly detection on the new time series F. This system is model-agnostic, meaning that any multivariate time-series anomaly detection method

Anomaly ()

can be used here. This is important because this completely decouples the system from the anomaly detection model. Since the model is completely separate, models can be chosen based on factors such as training data availability, amount of computation power, state of the art, and personal preference.

For example, in practice, it is difficult and impractical to label thousands of historic data points. Therefore, one may chose to use a self-supervised or unsupervised anomaly detection model. Alternatively, if a priority is to minimize server costs, one may opt to use a simpler anomaly detection model with fewer parameters to reduce computation. Finally, if newer and more accurate anomaly detection models are published, they can be seamlessly plugged into this system without much change.

For every new time series F that is generated, we obtain a set of timestamps of anomalies A as follows:

\begin{matrix} A = Anomaly (F) = {t_{1}, t_{2}, \dots} \end{matrix}

(8)

Anomaly ()

is trained with the historical data of devices paired in the same fashion as described above. These trained models will take the combined time series as input and output a set of indices A that are anomalous. These points are then compared to the WeatherAPI in the next step for a final confirmation.

3.5. WeatherAPI Verification

The list of anomalous timestamps A is the timestamps of the fused data. Therefore, we have to map the fused timestamps back to the original device data timestamps. This is achieved by simply finding the closest timestamp in the sequence of timestamps

s_{t i m e}

, which are the timestamps of the readings of sensor s.

\begin{matrix} f_{s} (t) = arg min_{s_{t} \in s_{time}} | t - s_{t} | \end{matrix}

(9)

For both sensors in the pair, all timestamps in A are mapped back to sensor reading timestamps using

f_{s} (t)

.

\begin{matrix} A_{s} & = {f_{s} (t) ∣ t \in A} \end{matrix}

(10)

This step results in two sets of indices

A_{s}

, one for each sensor in the pair.

Then, we confirm each anomalous reading with WeatherAPI data to further confirm any potential anomalies. WeatherAPI is an online API that provides weather data for any point in the world at any time, so comparing the anomalous sensor readings with the WeatherAPI is a good secondary source of truth. When comparing readings from a sensor to the WeatherAPI, we define a reading to be anomalous if

\begin{matrix} Anomalous (s, t) = | s (t) - WeatherAPI (s_{l a t l o n g}, t) | > τ \end{matrix}

(11)

where

τ

is some threshold. We filter the anomalous indices from

A_{s}

to

A_{s}^{'}

as follows:

\begin{matrix} A_{s}^{'} = {t \in A_{s} ∣ Anomalous (s, t)} \end{matrix}

(12)

This step further ensures that no anomalous weather pattern is falsely labeled as an anomaly. It also helps to determine which of the underlying devices was responsible for the anomaly in the combined time series.

With an accurate list of all detected anomalies

A_{s}^{'}

for each device, the anomaly detection service will send that data to the floodwatch.io API, where the corresponding points will be marked as anomalous in our database. Precisely, the steps of our method can be summarized as Algorithm 1.

Algorithm 1 Sensor Anomaly Detection System

Require:: Set of all sensors $S_{all}$
Ensure:: Detected anomalies sent to floodwatch.io API
1:: $S_{latlong} = {(s_{lat}, s_{long}) ∣ \forall s \in S_{all}}$
2:: $C = DBSCAN (S_{latlong})$
3:
4:: for each cluster $C_{k}$ in C do
5:: $G_{k} = (V, E)$ , $V = {s ∣ s \in C_{k}}$ , $E = {(s_{i}, s_{j}, d_{geo} (s_{i}, s_{j})) ∣ s_{i}, s_{j} \in V, i \neq j}$
6:: $P_{k} = Edmonds (G_{k})$
7:
8:: for each pair $(s_{a}, s_{b})$ in $P_{k}$ do
9:: $F = Fuse (s_{a}, s_{b})$
10:: $A = Anomaly (F)$
11:: for each sensor s in $(s_{a}, s_{b})$ do
12:: $A_{s} = {f_{s} (t) ∣ t \in A}$
13:: for each anomaly timestamp t in $A_{s}$ do
14:: if $Anomalous (s, t)$ then
15:: Add t to $A_{s}^{'}$
16:: Output anomalies $A_{s}^{'}$

4. Experiments

4.1. Potential Data Sources

We have explored several anomaly detection datasets that we could potentially use for our experiments. Common anomaly detection benchmarking datasets include SMD, SWaT, WaDI, MSL, and SMAP [36]. Geographical location and temporal similarity is either anonymized or not relevant within these datasets [37,38,39,40]. Since our system leverages the location and temporal similarity of a network of sensors, we want these qualities to be present in our experiment data. Therefore, these commonly used anomaly detection datasets do not suffice for our research purposes.

We would ideally use the Floodwatch platform’s device readings for our experiments. Our network of devices has over 30 LoRaWAN rain gauge devices deployed throughout the world. These rain gauge devices transmit temperature, humidity, and precipitation readings every 5 min. Some of these devices have been transmitting data for as long as nine months, while others are newly deployed and have transmitted a few weeks of data. However, our real time sensor streams do not suffice for our experiments for two main reasons. First, they do not have a continuous uptime, so they cannot be used for large-scale experiments. Secondly, they have an inherently low anomaly rate, which is also not ideal for use in experiments.

4.2. Synthetic Device Data

Several other papers have used simulated data to test anomaly detection models and systems, including the crowd management, elderly care, and nuclear reactor settings [41,42,43]. We opt to follow their protocol and use synthetic data for our experiments. As suggested by [44], we use real, measured data as a basis for the simulated device readings.

4.2.1. Simulated Devices

We gathered one year of hourly temperature, humidity, and precipitation values for Ho Chi Minh City from WeatherAPI as a base time series [45]. We then simulate eight different IoT devices, where each device corresponds to one of the LoRaWAN devices we have deployed in Da Nang (as shown in Figure 3). We utilize the exact location of these real sensors, which is important for the device pairing step. The distances between the sensors range from between

0.44

and

8.43

km apart. The resulting eight synthetic data streams are then derived from the initial time series.

We simulate the devices similar to how their real world counterpart behaves. Of the eight synthetic devices, one has a constant offset, one has an oscillating offset, one has no rain, and five are simulated as normal. We simulate the five normal devices by adding Gaussian white noise

N (0, σ)

, where

σ

is a uniformly random standard deviation between 0.05 and 0.15, to the temperature and humidity readings. To add noise to the precipitation readings, we add Gaussian white noise

N (0, σ)

, where

σ

is a uniformly random standard deviation between 0.01 and 0.05. We only add Gaussian white noise to non-zero precipitation values, as it does not make sense for there to be noise when there is no rain to measure. Additionally, the precipitations are clipped so that all negative values are set to zero.

The device with a constant offset is representative of a device on the Floodwatch network that we observe to be constantly slightly offset from WeatherAPI readings. This is possibly due to being in a specific location with slightly different readings (i.e., a rooftop). We simulate this device by adding Gaussian noise with non-zero mean to the temperature and humidity columns. The precipitation column is simulated the same way as the normal devices. This device is shown in Figure 5.

The one device with oscillating offset is representative of a device on the Floodwatch network that senses higher temperature and humidity at night but has normal readings during the day. We simulate the oscillating offset device by adding a sine function and Gaussian white noise to the temperature and humidity column.

The device with no rain is representative of a device on the Floodwatch network that does not transmit precipitation. The exact reason for this is unknown, but is likely due to a faulty precipitation sensor, being placed under a roof, or the device being indoors. This device is simulated by simply setting precipitation values to 0 for certain periods of time. More detail regarding the anomalous behavior of this sensor is discussed in the next section.

4.2.2. Simulated Anomalies

After simulating the time series, we manually add and label anomalies in the simulated device data. We include anomalies primarily in the test set, as we do not want the unsupervised models to learn anomalies as “normal”. This is similar to how the SWaT dataset includes two separate data streams, whether the testbed is running in either a normal state or attacked state [38].

We are careful to include both point and contextual anomalies [46,47]. See Figure 6 for examples of both point and contextual anomalies. While some anomalies are manually added, a bulk of them are randomly generated. In total, 9% of the data points in the test set are labeled as anomalous.

As mentioned earlier, we also manually set several precipitation values to 0 as anomalies. These changes are labeled anomalies, as not detecting rain would qualify as an anomalous behavior. However, other behaviors, such as having a constant offset or an oscillating offset, do not qualify as anomalies. These deviations from the actual weather are simply a product of the sensor’s specific environment, such as being on a roof, and do not negatively impact flood forecasting ability.

4.3. Procedures

To test the system, we focus specifically on the anomaly detection model’s performance, as executing the entire anomaly detection system (device grouping, sensor pairing, and WeatherAPI corroboration) is not feasible for large-scale experiments. From the original eight simulated devices, we create two datasets to train the models. One dataset,

D_{s o t a}

, represents the current state of the art and includes the synthetic data in its original form. This dataset includes the sensor readings as normal without any fusion. The other dataset,

D_{f l o o d w a t c h}

, represents using our system. This dataset uses the device location to calculate a minimum weight matching and uses the calculated pairs to generate combined time series, just as described in Section 3.

We run experiments with nine different models of both unsupervised and self-supervised natures: AnomalyBERT, AnomalyTransformer, COUTA, DeepIsolationForest, DeepSVDD, TcnED, TimesNet, TranAD, and USAD. We choose these models because they are all unsupervised models and they represent a good variety of ML methodologies. Unsupervised methods are preferred due to the unlabeled nature of anomaly data. While our dataset includes labels because the anomalies are simulated, it is difficult to obtain anomaly labels for real sensor data. We use the model implementations from sklearn, the AnomalyBERT paper, and the DeepOD v0.4.1 GitHub repository [48,49,50].

The steps of our experiments for each model are as follows. First, we simulate an environment where we would not use our system. Here, the model is trained on

D_{s o t a}

, and the model’s test precision, recall, and F1-score are recorded. Then, we simulate an environment where we use our system to create combined data streams. The model is trained on the

D_{f l o o d w a t c h}

, and the model’s test precision, recall, and F1-score are recorded.

5. Results

After testing each model with and without our method, we record the precision, recall and F1-score in Table 1. We discuss both the difference in performance within the same model, as well as the the difference in performance between models.

We record the change in precision, recall, and F1-score for each model in Table 2. Amongst all nine models, anomaly detection performance increases by 0.08 F1-score when implementing our method. Additionally, none of the models decrease in performance when utilizing our system. TcnED improved the most with our system, as its F1-score increased by 0.22. Conversely, AnomalyBERT, AnomalyTransformer, and COUTA have relatively low increases, ranging from 0.01 to 0.03 in F1-score. We also observe that recall increases more than precision when using our system.

Analysis of Results

TcnED performs the best when using our method (0.82 F1-score). The DeepIsolationForest model performs the best without our method, and is a close second when using our method. Conversely, TimesNet performs quite poorly on this dataset, regardless of using our method.

The overall increase in F1-score supports our hypothesis that combining device datastreams results in greater robustness in anomaly detection methods. Across the board, every single anomaly detection model sees an improvement in performance. But this performance improvement varies drastically between models. Some model architectures see a significant improvement while others see merely a marginal improvement.

Effectively, there are two groups of models: models whose F1-Score improves significantly through their precision and models whose improvement is driven by their recall. The Transformer-based models—AnomalyBERT, AnomalyTransformer, and TranAD—are members of the former class, and the other methods, including COUTA, DeepIsolationForest, DeepSVDD, TcnED, TimesNet, and USAD, are members of the latter.

Overall, recall increased more than precision, indicating an overall decrease in false negatives. This means that fusing the sensors leads to identifying anomalies that were previously unidentified. A portion of these anomalies are likely from the no-rain sensor. For a single stream of sensor data, it is nearly impossible to detect a sensor not detecting rain; the sensor always reads 0, which is not anomalistic. However, when fusing data, the difference makes it obvious that the device is not detecting any rain. This is one of the key types of anomalies that our system can detect that cannot be detected without our system.

6. Discussion

The results of our study demonstrate a significant improvement in anomaly detection performance when employing our sensor fusion approach, which corroborates data between proximal sensors. Across nine tested anomaly detection models, our system increases performance by 0.08 in terms of F1-score, or 10.8%.

There are several advantages to using the method described in this paper. For one, it is able to detect more anomalies, as some anomalies cannot be detected from a singular sensor stream. This ability is reflected in the increase in recall. Secondly, the sensor corroboration prevents the incorrect classification of events as anomalies. Additionally, the model-agnostic nature of our system ensures compatibility with evolving machine learning techniques, maintaining flexibility for future advancements in anomaly detection.

6.1. Applications to Other IoT Settings

Although this framework was designed for Floodwatch, it has applications to several other projects of similar nature. In fact, any IoT system that has redundant sensors can utilize this system. Some examples include soil/air sensors in smart farming or air quality sensors for smart cities [55,56]. The sensors for these IoT systems are also spatio-temporally related, and that relationship can be utilized for corroboration as described in this article.

6.2. Limitations

One practical limitation of this approach is the assumption of uniform environmental conditions. The sensor fusion relies on weather being similar in nearby areas, which may not hold true in every situation. Some nearby locations may have drastically different weather based on factors such as elevation or distance from water. Small and predictable variations (such as oscillating or constant offsets from our experiments) can be addressed by our system, as shown from our experiments. Larger variations are addressed in our system by the WeatherAPI corroboration. However, not all other applications of this system will have a ground-truth API to utilize.

Another practical limitation is the scalability of this approach. Energy usage will not be an issue, as all of the processing occurs on the cloud. However, data issues such as network latency or sensor downtime could potentially lead to issues. Floodwatch sensors currently do not experience latency or downtime due to our smaller-size network and proximity to LoRaWAN sensors. However, since those issues may appear in the future of Floodwatch or be present in other systems, we leave that research to be conducted in future studies.

Another limitation is the possibility of coordinated attacks on devices. If a malicious actor determines how the devices are paired, they could attack the paired sensors at exactly the same time in exactly the same way. This would make the fused time series become normal, but the underlying data themselves would be anomalous. Addressing these issues would involve integrating more sophisticated sensor pairing or sensor fusion techniques.

6.3. Future Work

In the future, we hope to perform some live validation experiments of our system. With a fully implemented system and working sensors, we plan to physically disrupt a sensor (i.e., pouring water or using a heated dish) and verify that the system identifies those times as anomalies. These experiments will show that the system performs well on real, non-simulated data, and they will confirm that our system is functional in a live, operational setting.

While our experiments do not show a direct causation between improved anomaly detection and improved flood forecasting, we believe that this increase in performance will greatly benefit Floodwatch. More accurate and robust sensor data will lead to more accurate weather data, thus leading to better flood prediction. Therefore, another future research direction is to experimentally prove that improved anomaly detection will lead to improved flood prediction. We plan to pursue this direction after we have deployed many more sensors onto our platform. We also open the opportunity for others to show that this method improved smart-city performance in other IoT systems.

7. Conclusions

In this article, we present a novel approach to anomaly detection in an IoT flood forecasting environment. We propose an anomaly detection system that compares device readings with other nearby devices by fusing the data into a new time series. Through experiments on nine different ML models, we demonstrate that our anomaly detection system improves anomaly detection performance by 0.083 (or 10.8%) in terms of F1-score on synthetic data based on real-world weather data. Our anomaly detection system is applicable beyond flood forecasting and is suitable for sensor networks in several other relevant areas, such as climate monitoring and environmental risk assessment.

Author Contributions

Conceptualization, A.M. and A.K.; methodology, A.M. and A.K.; software, A.M. and A.K.; validation, A.M., A.K. and N.D.; formal analysis, A.M.; investigation, A.M.; resources, A.M.; data curation, A.M.; writing—original draft preparation, A.M., A.K. and N.D.; writing—review and editing, A.M. and N.D.; visualization, A.M.; supervision, N.R.N. and F.F.; project administration, N.R.N. and F.F.; funding acquisition, N.R.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research sponsored by the U.S. National Science Foundation under grant #2026050.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this research are available by request.

Acknowledgments

We would like to acknowledge Sidhardh Burre from the University of Virginia for his valuable assistance with the training code and for his insightful discussions on the results.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IoT	Internet of Things
ML	Machine Learning
DBSCAN	Density-Based Clustering of Applications with Noise

References

Bangalore, M.; Smith, A.; Veldkamp, T. Exposure to floods, climate change, and poverty in Vietnam. Econ. Disasters Clim. Change 2019, 3, 79–99. [Google Scholar]
Luo, T.; Maddocks, A.; Iceland, C.; Ward, P.; Winsemius, H. World’s 15 Countries with the Most People Exposed to River Floods. 2015. Available online: https://www.wri.org/insights/worlds-15-countries-most-people-exposed-river-floods (accessed on 5 February 2025).
Khodadad, M.; Aguilar-Barajas, I.; Khan, A.Z. Green Infrastructure for Urban Flood Resilience: A Review of Recent Literature on Bibliometrics, Methodologies, and Typologies. Water 2023, 15, 523. [Google Scholar] [CrossRef]
Chitwatkulsiri, D.; Miyamoto, H. Real-Time Urban Flood Forecasting Systems for Southeast Asia—A Review of Present Modelling and Its Future Prospects. Water 2023, 15, 178. [Google Scholar] [CrossRef]
Mendoza-Cano, O.; Aquino-Santos, R.; López-de la Cruz, J.; Edwards, R.M.; Khouakhi, A.; Pattison, I.; Rangel-Licea, V.; Castellanos-Berjan, E.; Martinez-Preciado, M.; Rincón-Avalos, P.; et al. Experiments of an IoT-based wireless sensor network for flood monitoring in Colima, Mexico. J. Hydroinformatics 2021, 23, 385–401. [Google Scholar]
Gupta, A.; Kim, A.; Karande, A.; Yan, S.; Manandhar, S.; Nguyen, N.R. Validating Crowdsourced Flood Images using Machine Learning and Real-time Weather Data. In Proceedings of the 2022 IEEE 16th International Conference on Big Data Science and Engineering (BigDataSE), Wuhan, China, 9–11 December 2022; pp. 7–12. [Google Scholar]
Erhan, L.; Ndubuaku, M.; Di Mauro, M.; Song, W.; Chen, M.; Fortino, G.; Bagdasar, O.; Liotta, A. Smart anomaly detection in sensor systems: A multi-perspective review. Inf. Fusion 2021, 67, 64–79. [Google Scholar]
Pang, G.; Shen, C.; Cao, L.; Hengel, A.V.D. Deep learning for anomaly detection: A review. ACM Comput. Surv. (CSUR) 2021, 54, 1–38. [Google Scholar]
Samara, M.A.; Bennis, I.; Abouaissa, A.; Lorenz, P. A survey of outlier detection techniques in IoT: Review and classification. J. Sens. Actuator Netw. 2022, 11, 4. [Google Scholar] [CrossRef]
Li, K.L.; Huang, H.K.; Tian, S.F.; Xu, W. Improving one-class SVM for anomaly detection. In Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 03EX693), Xi’an, China, 5 November 2003; Volume 5, pp. 3077–3081. [Google Scholar]
Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
Darban, Z.Z.; Webb, G.I.; Pan, S.; Aggarwal, C.C.; Salehi, M. Deep learning for time series anomaly detection: A survey. arXiv 2022, arXiv:2211.05244. [Google Scholar]
Schmidl, S.; Wenig, P.; Papenbrock, T. Anomaly detection in time series: A comprehensive evaluation. Proc. VLDB Endow. 2022, 15, 1779–1797. [Google Scholar]
Goh, J.; Adepu, S.; Tan, M.; Lee, Z.S. Anomaly detection in cyber physical systems using recurrent neural networks. In Proceedings of the 2017 IEEE 18th International Symposium on High Assurance Systems Engineering (HASE), Singapore, 12–14 January 2017; pp. 140–145. [Google Scholar]
Chauhan, S.; Vig, L. Anomaly detection in ECG time signals via deep long short-term memory networks. In Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Paris, France, 19–21 October 2015; pp. 1–7. [Google Scholar]
Audibert, J.; Michiardi, P.; Guyard, F.; Marti, S.; Zuluaga, M.A. Usad: Unsupervised anomaly detection on multivariate time series. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, CA, USA, 6–10 July 2020; pp. 3395–3404. [Google Scholar]
Sakurada, M.; Yairi, T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, Gold Coast, Australia, 2 December 2014; pp. 4–11. [Google Scholar]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Malhotra, P.; Ramakrishnan, A.; Anand, G.; Vig, L.; Agarwal, P.; Shroff, G. LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv 2016, arXiv:1607.00148. [Google Scholar]
Tuli, S.; Casale, G.; Jennings, N.R. Tranad: Deep transformer networks for anomaly detection in multivariate time series data. arXiv 2022, arXiv:2201.07284. [Google Scholar]
Xu, J.; Wu, H.; Wang, J.; Long, M. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv 2021, arXiv:2110.02642. [Google Scholar]
Deng, A.; Hooi, B. Graph neural network-based anomaly detection in multivariate time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 4027–4035. [Google Scholar]
Jabez, J.; Muthukumar, B. Intrusion Detection System (IDS): Anomaly Detection Using Outlier Detection Approach. Procedia Comput. Sci. 2015, 48, 338–346. [Google Scholar]
Alzahrani, R.J.; Alzahrani, A. A novel multi algorithm approach to identify network anomalies in the IoT using Fog computing and a model to distinguish between IoT and Non-IoT devices. J. Sens. Actuator Netw. 2023, 12, 19. [Google Scholar] [CrossRef]
Zidi, S.; Moulahi, T.; Alaya, B. Fault detection in wireless sensor networks through SVM classifier. IEEE Sens. J. 2017, 18, 340–347. [Google Scholar]
Cauteruccio, F.; Fortino, G.; Guerrieri, A.; Liotta, A.; Mocanu, D.C.; Perra, C.; Terracina, G.; Vega, M.T. Short-long term anomaly detection in wireless sensor networks based on machine learning and multi-parameterized edit distance. Inf. Fusion 2019, 52, 13–30. [Google Scholar]
Holst, C.A.; Lohweg, V. A Redundancy Metric Set within Possibility Theory for Multi-Sensor Systems. Sensors 2021, 21, 2508. [Google Scholar] [CrossRef]
Elmenreich, W. An Introduction to Sensor Fusion; Vienna University of Technology: Vienna, Austria, 2002; Volume 502, pp. 1–28. [Google Scholar]
Alam, F.; Mehmood, R.; Katib, I.; Albogami, N.N.; Albeshri, A. Data fusion and IoT for smart ubiquitous environments: A survey. IEEE Access 2017, 5, 9533–9554. [Google Scholar]
Cauteruccio, F.; Fortino, G.; Guerrieri, A.; Terracina, G. Discovery of hidden correlations between heterogeneous wireless sensor data streams. In Proceedings of the Internet and Distributed Computing Systems: 7th International Conference, IDCS 2014, Calabria, Italy, 22–24 September 2014; pp. 383–395. [Google Scholar]
de Farias, C.M.; Pirmez, L.; Delicato, F.C.; Pires, P.F.; Guerrieri, A.; Fortino, G.; Cauteruccio, F.; Terracina, G. A multisensor data fusion algorithm using the hidden correlations in Multiapplication Wireless Sensor data streams. In Proceedings of the 2017 IEEE 14th International Conference on Networking, Sensing and Control (ICNSC), Calabria, Italy, 16–18 May 2017; pp. 96–102. [Google Scholar]
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the KDD, Portland, OR, USA, 2–4 August 1996; Volume 96, pp. 226–231. [Google Scholar]
Edmonds, J. Maximum matching and a polyhedron with 0, 1-vertices. J. Res. Natl. Bur. Stand. B 1965, 69, 55–56. [Google Scholar]
Plummer, M.D.; Lovász, L. Matching Theory; Elsevier: Amsterdam, The Netherlands, 1986. [Google Scholar]
Micali, S.; Vazirani, V.V. An O (v|v| c |E|) algoithm for finding maximum matching in general graphs. In Proceedings of the 21st Annual Symposium on Foundations of Computer Science (sfcs 1980), Syracuse, NY, USA, 13–15 October 1980; pp. 17–27. [Google Scholar]
DeMedeiros, K.; Hendawi, A.; Alvarez, M. A survey of AI-based anomaly detection in IoT and sensor networks. Sensors 2023, 23, 1352. [Google Scholar] [CrossRef] [PubMed]
Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2828–2837. [Google Scholar]
Goh, J.; Adepu, S.; Junejo, K.N.; Mathur, A. A dataset to support research in the design of secure water treatment systems. In Proceedings of the Critical Information Infrastructures Security: 11th International Conference, CRITIS 2016, Paris, France, 10–12 October 2016; pp. 88–99. [Google Scholar]
Ahmed, C.M.; Palleti, V.R.; Mathur, A.P. WADI: A water distribution testbed for research in the design of secure cyber physical systems. In Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks, Pittsburgh, PA, USA, 21 April 2017; pp. 25–28. [Google Scholar]
Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 387–395. [Google Scholar]
Bamaqa, A.; Sedky, M.; Bosakowski, T.; Bakhtiari Bastaki, B.; Alshammari, N.O. SIMCD: SIMulated crowd data for anomaly detection and prediction. Expert Syst. Appl. 2022, 203, 117475. [Google Scholar] [CrossRef]
Tanaka, K.; Kudo, M.; Kimura, K. Sensor Data Simulation for Anomaly Detection of the Elderly Living Alone. arXiv 2023. [Google Scholar] [CrossRef]
Pedro Mena, R.A.B.; Kerby, L. Detecting Anomalies in Simulated Nuclear Data Using Autoencoders. Nucl. Technol. 2024, 210, 112–125. [Google Scholar] [CrossRef]
Steinbuss, G.; Böhm, K. Benchmarking unsupervised outlier detection with realistic synthetic data. Acm Trans. Knowl. Discov. Data (TKDD) 2021, 15, 1–20. [Google Scholar] [CrossRef]
WeatherAPI. Free Weather API—WeatherAPI.com. 2023. World’s 15 Countries with the Most People Exposed to River Floods. 2015. Available online: https://www.weatherapi.com (accessed on 5 February 2025).
Cook, A.A.; Mısırlı, G.; Fan, Z. Anomaly detection for IoT time-series data: A survey. IEEE Internet Things J. 2019, 7, 6481–6494. [Google Scholar] [CrossRef]
Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 2009, 41, 1–58. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Jeong, Y.; Yang, E.; Ryu, J.H.; Park, I.; Kang, M. AnomalyBERT: Self-Supervised Transformer for Time Series Anomaly Detection using Data Degradation Scheme. arXiv 2023, arXiv:2305.04468. [Google Scholar]
Xu, H.; Pang, G.; Wang, Y.; Wang, Y. Deep Isolation Forest for Anomaly Detection. IEEE Trans. Knowl. Data Eng. 2023, 35, 12591–12604. [Google Scholar] [CrossRef]
Xu, H.; Wang, Y.; Jian, S.; Liao, Q.; Wang, Y.; Pang, G. Calibrated one-class classification for unsupervised time series anomaly detection. arXiv 2022, arXiv:2207.12201. [Google Scholar] [CrossRef]
Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep one-class classification. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 4393–4402. [Google Scholar]
Garg, A.; Zhang, W.; Samaran, J.; Savitha, R.; Foo, C.S. An evaluation of anomaly detection and diagnosis in multivariate time series. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2508–2517. [Google Scholar]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
Dagar, R.; Som, S.; Khatri, S.K. Smart farming—IoT in agriculture. In Proceedings of the 2018 International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 11–12 July 2018; pp. 1052–1056. [Google Scholar]
Johnston, S.J.; Basford, P.J.; Bulot, F.M.; Apetroaie-Cristea, M.; Easton, N.H.; Davenport, C.; Foster, G.L.; Loxham, M.; Morris, A.K.; Cox, S.J. City scale particulate matter monitoring using LoRaWAN based air quality IoT devices. Sensors 2019, 19, 209. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Floodwatch web platform.

Figure 2. Overview of sensor verification system in the context of floodwatch.io framework.

Figure 3. An optimal pairing of our deployed sensors in the city of Da Nang, Vietnam. Each sensor’s readings are fused with the readings of its pair.

Figure 4. Sensor readings are combined by linearly interpolating the readings, and then taking the differences at specified intervals (highlighted green).

Figure 5. The temperature, humidity, and precipitation readings for simulated device that has a constant offset of +1 degrees C and +3 humidity %. The precipitation readings are affected by Gaussian white noise.

Figure 6. Examples of a point anomaly (top) and a contextual anomaly (bottom). Anomalies are highlighted in yellow.

Table 1. Precision, recall, and F1-scores for various anomaly detection models trained with and without combined data. Highest F1 in each column is bolded, second highest is underlined, and third highest is italicized.

	Without Proposed Method			Using Proposed Method
Model	Precision	Recall	F1-Score	Precision	Recall	F1-Score
AnomalyBERT [49]	0.8455	0.6817	0.7548	0.9194	0.6963	0.7925
AnomalyTransformer [21]	0.7159	0.8020	0.7565	0.7604	0.8172	0.7878
COUTA [51]	0.7387	0.7790	0.7583	0.7077	0.8628	0.7776
DeepIsolationForest [50]	0.7317	0.7935	0.7614	0.7822	0.8545	0.8168
DeepSVDD [52]	0.6804	0.5922	0.6332	0.7463	0.7421	0.7442
TcnED [53]	0.7361	0.5094	0.6021	0.8397	0.8060	0.8224
TimesNet [54]	0.4085	0.5853	0.4812	0.4297	0.7696	0.5514
TranAD [20]	0.7720	0.7108	0.7401	0.8371	0.7575	0.7953
USAD [16]	0.7190	0.5785	0.6411	0.7627	0.8153	0.7881

Table 2. Increase in precision, recall, and F1-score for each model. The average increase across all models is shown at the bottom.

	Increase In
Model	Precision	Recall	F1-Score
AnomalyBERT	0.0739	0.0146	0.0377
AnomalyTransformer	0.0445	0.0152	0.0313
COUTA	−0.0310	0.0838	0.0193
DeepIsolationForest	0.0505	0.0610	0.0554
DeepSVDD	0.0659	0.1499	0.1110
TcnED	0.1036	0.2966	0.2203
TimesNet	0.0212	0.1843	0.0702
TranAD	0.0651	0.0467	0.0552
USAD	0.0437	0.2368	0.1470
avg.	0.0486	0.1210	0.0830

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, A.; Karande, A.; Dahlquist, N.; Ferrero, F.; Nguyen, N.R. Sensor Fusion Enhances Anomaly Detection in a Flood Forecasting System. J. Sens. Actuator Netw. 2025, 14, 34. https://doi.org/10.3390/jsan14020034

AMA Style

Ma A, Karande A, Dahlquist N, Ferrero F, Nguyen NR. Sensor Fusion Enhances Anomaly Detection in a Flood Forecasting System. Journal of Sensor and Actuator Networks. 2025; 14(2):34. https://doi.org/10.3390/jsan14020034

Chicago/Turabian Style

Ma, Andrew, Abhir Karande, Natalie Dahlquist, Fabien Ferrero, and N. Rich Nguyen. 2025. "Sensor Fusion Enhances Anomaly Detection in a Flood Forecasting System" Journal of Sensor and Actuator Networks 14, no. 2: 34. https://doi.org/10.3390/jsan14020034

APA Style

Ma, A., Karande, A., Dahlquist, N., Ferrero, F., & Nguyen, N. R. (2025). Sensor Fusion Enhances Anomaly Detection in a Flood Forecasting System. Journal of Sensor and Actuator Networks, 14(2), 34. https://doi.org/10.3390/jsan14020034

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sensor Fusion Enhances Anomaly Detection in a Flood Forecasting System

Abstract

1. Introduction

1.1. Flooding

1.2. Floodwatch

1.3. Anomaly Detection

2. Related Works

2.1. Time-Series Anomaly Detection Methods

2.2. Applications of Anomaly Detection

2.3. Sensor Fusion

3. Materials and Methods

3.1. Device Grouping

3.2. Device Pairing

3.3. Sensor Fusion

3.4. Anomaly Detection

3.5. WeatherAPI Verification

4. Experiments

4.1. Potential Data Sources

4.2. Synthetic Device Data

4.2.1. Simulated Devices

4.2.2. Simulated Anomalies

4.3. Procedures

5. Results

Analysis of Results

6. Discussion

6.1. Applications to Other IoT Settings

6.2. Limitations

6.3. Future Work

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI