Sensor Fusion Enhances Anomaly Detection in a Flood Forecasting System
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn this paper, the authors present a novel approach for improving anomaly detection in flood forecasting by utilizing sensor fusion techniques. The authors develop a system that pairs IoT weather sensors based on geographical proximity and fuses their readings into a new time-series dataset. The main idea is that nearby sensors should have similar readings, and deviations between them can be used to detect anomalies. The system is model-agnostic, meaning it can integrate different anomaly detection models without requiring structural modifications. The authors test their method on nine different machine learning models using one year of real weather data and report an average F1-score improvement of 10.8% over traditional approaches.
The paper is well written and flows well. The context is very interesting and the approach is of great importance for the field of anomaly detection. However, there are some aspects that should be addressed in order to improve the quality of the paper:
- The advantages of the proposed method of this paper should be more highlighted.
- The authors claim that the system is able to improve real-world flood forecasting. However, the experiment primarily rely on synthetic data, therefore it is not easy to judge the practical applicability of the results. More comments on this should be provided.
- An important assumption is that nearby sensors will have similar readings, which may not be always true especially in the considered systems (see also suggested references which work directly in this direction). Variations in data can be caused by different factors (e.g., localized weather patterns). A thorough discussion on this should be provided.
- The scalability of the system concerns me, in the sense that more information could be provided. As far as I understand, the proposed approach requires pairing sensors based on geographical proximity, but no other factors are discussed (e.g., energy constraints or network latency) when this should be applied at scale. More comments should be provided.
- Figure 4 is not informative enough, it could be modified.
- It is not clear why only unsupervised models have been selected for the Anomaly() function in the experiments.
- What are the limitations of the proposed study?
- The conclusions should be extended and future lines of research should be discussed with more care.
- Finally, while the related literature section is informative, it really does not cover the majority of the recent works on anomaly detection. For instance, in the last years, several works have been proposed which also integrate sensor fusion for anomaly detection in different systems. The authors should cite the following references, as well as recent ones, to improve the body of knowledge of the paper:
- "Short-long term anomaly detection in wireless sensor networks based on machine learning and multi-parameterized edit distance." Information Fusion 52 (2019): 13-30.
- "Discovery of hidden correlations between heterogeneous wireless sensor data streams." *Internet and Distributed Computing Systems: 7th International Conference, IDCS
- Proceedings 7*. Springer International Publishing, 2014.
- "A multisensor data fusion algorithm using the hidden correlations in Multiapplication Wireless Sensor data streams." 2017 IEEE 14th International Conference on Networking, Sensing and Control (ICNSC). IEEE, 2017.
< !-- notionvc: 451f7a6d-39c5-4c4c-a603-cfba205c1e83 -->
Author Response
The advantages of the proposed method of this paper should be more highlighted.
- We agree with this comment. We have added a brief sentence in the Introduction on page 3 lines 78-80 and added a paragraph highlighting advantages of this method on page 14 lines 469-474.
The authors claim that the system is able to improve real-world flood forecasting. However, the experiment primarily rely on synthetic data, therefore it is not easy to judge the practical applicability of the results. More comments on this should be provided.
- Thank you for pointing this out. We agree with this comment. Therefore, we have made a few changes. We changed the title of the paper from “Sensor Fusion Enhances Anomaly Detection in Flood Forecasting Systems” to “Sensor Fusion Enhances Anomaly Detection in a Flood Forecasting System”. Additionally, we have added a “Future Work” section in the Discussion section and added a paragraph that highlights how future work can prove the practical applicability of the results. Those changes are on page 15 lines 502-515. We have also removed sentences that claim the practical applicability of the results.
An important assumption is that nearby sensors will have similar readings, which may not be always true especially in the considered systems (see also suggested references which work directly in this direction). Variations in data can be caused by different factors (e.g., localized weather patterns). A thorough discussion on this should be provided.
- Thank you for your comment. We already have a paragraph discussing this, but we have added some more information to that paragraph on page 14 lines 482-489.
The scalability of the system concerns me, in the sense that more information could be provided. As far as I understand, the proposed approach requires pairing sensors based on geographical proximity, but no other factors are discussed (e.g., energy constraints or network latency) when this should be applied at scale. More comments should be provided.
- Thank you for pointing this out. We agree with this comment. Therefore, we have added a paragraph in the Discussion section that notes issues on energy, network issues, and sensor uptime on page 14 lines 490-495.
Figure 4 is not informative enough, it could be modified.
- We have edited the figure (page 8) to include more information on each color and what the lines mean.
It is not clear why only unsupervised models have been selected for the Anomaly() function in the experiments.
- Agreed. Therefore, we have added an explanation page 12 lines 421-424 specifying that unsupervised models are selected due to the typically unlabeled nature of anomaly data.
What are the limitations of the proposed study?
- Thank you for your comment. We already have a limitations section in our “Discussion” section, but we have added more limitations to that section. The changes are on page 14 lines 485-489 and page 14 lines 490-495.
The conclusions should be extended and future lines of research should be discussed with more care.
- We agree with this comment. Therefore, we have added a “Future Work” section in our Discussion section that discusses future research questions. The changes are on page 15 lines 502-515.
Finally, while the related literature section is informative, it really does not cover the majority of the recent works on anomaly detection. For instance, in the last years, several works have been proposed which also integrate sensor fusion for anomaly detection in different systems. The authors should cite the following references, as well as recent ones, to improve the body of knowledge of the paper:
-
-
- "Short-long term anomaly detection in wireless sensor networks based on machine learning and multi-parameterized edit distance." Information Fusion 52 (2019): 13-30.
- "Discovery of hidden correlations between heterogeneous wireless sensor data streams." *Internet and Distributed Computing Systems: 7th International Conference, IDCS Proceedings 7*. Springer International Publishing, 2014.
- "A multisensor data fusion algorithm using the hidden correlations in Multiapplication Wireless Sensor data streams." 2017 IEEE 14th International Conference on Networking, Sensing and Control (ICNSC). IEEE, 2017.
-
- Thank you for providing these papers. We have cited these articles in our related works section. The changes are on page 4 lines 155-163 and page 5 lines 192-202.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper presents a simple method to detect anomalies in a rainfall sensor network. The approach is based on the definition of sensor pairs and the identification of anomalous measures using the differences between the data of the two sensors.
The authors briefly summarise the current literature on anomaly detection but apparently disregard the large number of papers related to graph machine learning or graph neural networks (more than 3000 quotations in Google Scholar) applied to rainfall sensors. Indeed, the type of anomaly detection explored in the paper is a particular case of interpolation and has also connections with the well-studied transfer learning problem. The proposed method is just based on the definition of the sensor pairs and thus exploits only one additional sensor instead of the set of surrounding sensors.
Additionally, the paper is supposed to deal with an IoT network, while, in the study presented, there is nothing more than a standard set of sensors connected to the Internet, as is the case everywhere in the world.
Ultimately, the conclusion that two sensors can provide better anomaly detection than one is relatively obvious. However, we still do not know how this impacts the performance of flood forecasting systems mentioned in the title because no experiment on flood forecasting is mentioned in the paper.
All the references to Vietnam, including the explanation of why data are unavailable (lines 328-339), seem a bit redundant, given that the only example provided is entirely synthetic.
As to the text itself, the abstract should specify to what the 10.8% improvement refer.
I disagree with the definition of the two main approaches in lines 27-30. Flood prediction models are not “highly computationally intensive and expensive to compute” if based on machine learning approaches, and, in any case, they substantially differ from monitoring flood features through sensors, which can just show what is going on.
dgeo in eq. (4) is not explicitly defined.
Line 267 “an difference”.
Line 277 “available compute”.
Line 314 “We’ve” should be “We have”.
Line 347 “[? ]”.
Figs. 5 and 6 have nothing on the horizontal axis.
Section 4.2.1 The synthetic experiment is not precisely described. How was the Gaussian noise defined? How were negative values prevented? How “close” were the synthetic sensors? Was the WeatherAPI corroboration possible? What is the space and time resolution of such an API?
Author Response
The authors briefly summarise the current literature on anomaly detection but apparently disregard the large number of papers related to graph machine learning or graph neural networks (more than 3000 quotations in Google Scholar) applied to rainfall sensors. Indeed, the type of anomaly detection explored in the paper is a particular case of interpolation and has also connections with the well-studied transfer learning problem. The proposed method is just based on the definition of the sensor pairs and thus exploits only one additional sensor instead of the set of surrounding sensors.
- Thank you for pointing this out. We agree with this comment. Therefore, we have added a paragraph to our related works section detailing one of these graph neural network based anomaly detection models on page 3-4 lines 120-126.
Additionally, the paper is supposed to deal with an IoT network, while, in the study presented, there is nothing more than a standard set of sensors connected to the Internet, as is the case everywhere in the world.
- Thank you for your comment. We have added some specification in the Experiments section that reflects what we said in the introduction (page 1 lines 38-39) when describing the sensor networks. The changes are on page 10 line 362.
Ultimately, the conclusion that two sensors can provide better anomaly detection than one is relatively obvious. However, we still do not know how this impacts the performance of flood forecasting systems mentioned in the title because no experiment on flood forecasting is mentioned in the paper.
- Thank you for pointing this out. We agree with this comment. Therefore, we have made a few changes. We changed the title of the paper from “Sensor Fusion Enhances Anomaly Detection in Flood Forecasting Systems” to “Sensor Fusion Enhances Anomaly Detection in a Flood Forecasting System”. Additionally, we have added a “Future Work” section in the Discussion section and added a paragraph that highlights how future work can prove the practical applicability of the results. Those changes are on page 15 lines 502-515. We have also removed sentences that claim the practical applicability of the results.
All the references to Vietnam, including the explanation of why data are unavailable (lines 328-339), seem a bit redundant, given that the only example provided is entirely synthetic.
- Agreed. We have removed those lines and added more information to the paragraph before it, as seen in page 10 lines 350-353.
As to the text itself, the abstract should specify to what the 10.8% improvement refer.
- Agreed. We have added “by improving F1-Score” to the abstract.
I disagree with the definition of the two main approaches in lines 27-30. Flood prediction models are not “highly computationally intensive and expensive to compute” if based on machine learning approaches, and, in any case, they substantially differ from monitoring flood features through sensors, which can just show what is going on.
- Thank you for pointing this out. We have changed “two main approaches” to “two main aspects” and removed the statement stating “highly computationally intensive and expensive to compute” on page 1 lines 26-27.
dgeo in eq. (4) is not explicitly defined.
- We have added a quick definition of dgeo to the text following eq. (4) on page 7 lines 260-261.
Line 267 “an difference”.
- We have changed “an” to “and” on page 8 line 291.
Line 277 “available compute”.
- We have changed “available compute” to “amount of computation available” on page 8 line 301.
Line 314 “We’ve” should be “We have”.
- We have changed “We’ve” to “We have” on page 10 line 338.
Line 347 “[? ]”.
- Thank you for pointing this out. We have fixed an error in the .bib file, which has fixed the citation now on page 10 line 361.
Figs. 5 and 6 have nothing on the horizontal axis.
- Agreed, we have added “Time” to the horizontal axis to both Figures 5 and 6 (page 11-12).
Section 4.2.1 The synthetic experiment is not precisely described. How was the Gaussian noise defined? How were negative values prevented? How “close” were the synthetic sensors? Was the WeatherAPI corroboration possible? What is the space and time resolution of such an API?
- Thank you for pointing this out. We agree with this comment. Therefore, we have added some sentences further describing the experiment. We address Gaussian noise, negative values, sensor distance, and resolution of the API in the beginning of 4.2.1 in page 11 lines 370-375. We address WeatherAPI corroboration in page 12 line 410-411.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors successfully addressed all of my comments, I have no more issues to note.
Author Response
The authors successfully addressed all of my comments, I have no more issues to note.
- Thank you for your comments.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have satisfactorily addressed the reviewer's comments. I still think the proposed method is not as effective as some of the current literature, but perhaps it is simpler.
The method's actual test will require a lot more data (and related anomalies) and thus may take years. I suggest the authors further stress this point, clearly indicating in the conclusions that the reported experiment is synthetic. Consequently, the sentence "one year of real weather data" in the abstract should be changed.
Author Response
The method's actual test will require a lot more data (and related anomalies) and thus may take years. I suggest the authors further stress this point, clearly indicating in the conclusions that the reported experiment is synthetic. Consequently, the sentence "one year of real weather data" in the abstract should be changed.
- Thank you. Agreed. We have changed "one year of real weather data" in the abstract to "synthetic data based on one year of real weather data." We have also added "Synthetic" to the conclusion on page 15 line 523.