On Evaluating IoT Data Trust via Machine Learning
Round 1
Reviewer 1 Report
The paper presents a data synthesis method used to enrich IoT time-series datasets by adding generated labeled untrustworthy data. Since most existing IoT related datasets contains mainly trustworthy recorded values the main benefit of this method is the ability to provide a more balanced dataset for training further ML models for data trust evaluation.
The paper is well written and contains many useful details about the proposed method. The authors implement an extensive validation process by comparing the results with a state-of-the-art method as Drift. They also use the generated dataset to demonstrate the poor performance of an unsupervised evaluation method based on clustering. However, the paper has also some flaws that should be addressed by the authors, as following.
1. The authors do not discuss the limitations of the method. E.g., how the method is applied for a binary sensor (e.g., door open/close, smoke detected/clear, etc.), or a discrete sensor (e.g., Throttle position sensor)?
2. As the authors mentioned [143] “Here, we propose a novel method to synthesize realistic untrustworthy IoT sensor data using real-world data collected through IoT sensor networks.”, it is not clear from the manuscript what type of untrustworthy data covers the method. E.g., the noise generated by a malfunctioning sensor may contain a lot of outliners not considered by the proposed methodology. Moreover, the data injected by malicious actors can mimic perfectly trusted data by in wrong moment of time. For example, temperature sensor data can be replaced with a pre-recorded temperature series that indicates no change to mask an intentional fire, etc.
3. The title of the paper is too general and does not reflect the content of the manuscript. The authors are advised to change it to something more focused on the proposed method, which is mainly related to generating IoT datasets.
The quality of language is fine.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
1) The authors need to clearly summarize the main contributions in the introduction.
2) There are several deep-learning approaches for IoT data trust evaluations. The authors need to survey more recent DL approaches to compare with the existing DL solutions.
3) The authors need to consider other datasets for the proposed solution. The used data set is too old.
4) What are the most important performance metrics? The authors need to provide more discussions about the performance evaluations in detail.
5) The authors need to clearly describe the advantages of RWI and drift in section 3.
Enough
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
This manuscript presents a method for evaluating the trustworthiness of IoT data using a newly proposed time-series sensor data synthesis method called RWI. The authors compare the RWI method with the Drift method, emphasizing WEI's ability to generate untrustworthy data that closely resembles trustworthy data. The paper also introduces a set of correlation features that capture spatiotemporal patterns in the data. The effectiveness of the proposed methods is evaluated using machine learning models on the Intel Lab dataset.
Strengths:
The introdction of the RWI method provides a novel approach to generating synthetic untrustworthy data, addressing the challenge of obtaining labeled untrustworthy IoT data.
The paper's emphasis on the importance of spatiotmporal correlations in IoT data trustworthiness evaluation is commendable
The use of visualization techniques, such as UMAP, offers a clear visual representation of the effectiveness of the proposed methods.
The manuscript provides a comprehensive evaluation using both supervised and unsupervised ML models, offering insights into their performance in the context of IoT data trustworthiness.
Weaknesses:
The paper relies heavily on synthetic methods without validating these against real-world untrustworthy data.
The DST features' underperformance is mentioned but not deeply explored, leaving a gap in understanding
The evaluations are primarily based on a single dataset (Intel Lab), which might not capture the diversity of real-world IoT environments.
There's no discussion on the computational costs of the RWI method
Recommendations:
It would be beneficial to validate the RWI and Drift methods against real-world untrustworthy data to ensure their practical applicability
A deeper exploration of why DST features underperform would provide a more comprehensive understanding.
Consider evaluating the proposed methods on diverse datasets to ensure their robustness across different IoT scenarios
Discuss the computational efficiency of the RWI method, especially if it's intended for real-time applications.
A comparison with state-of-the-art methods or external benchmarks would provide a clearer context for the proposed method's effectiveness.
In conclusion, the manuscript offers valuable insights into the evaluation of IoT data trustworthiness and introduces promising methods. With the recommended improvements, the paper has the potential to make a significant contribution to the field.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
The revised version of the manuscript adds some necessary clarifications to the presented method. The title was changed to reflect better the work described in the paper. The authors succeeded in adding explanations on the limitations of the method and highlighting the objectives of the developed method. In conclusion, the general improvements of the manuscript allow me to recommend it for publication in the present form.
The quality of language is fine.
Reviewer 2 Report
I'm satisfied with this revision.
Enough