A Blockchain-Based Framework for Secure Data Stream Dissemination in Federated IoT Environments
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper presents a novel framework for secure and reliable data stream dissemination in Federated IoT environments, leveraging blockchain, Apache Kafka, and microservices. While the work is promising, several areas could be improved to enhance its clarity, technical rigor, and practical relevance.
The abstract is informative but could be more concise, focusing on the problem addressed, key contributions, and results without overloading it with technical terminology. The introduction effectively identifies challenges in Federated IoT environments but lacks a clear and strong problem statement. It would benefit from explicitly outlining the research gap and providing a focused motivation for the proposed solution, supported by real-world use cases.
The framework design section is detailed but dense, with technical explanations that may overwhelm readers unfamiliar with the topic. Simplifying the diagrams, adding legends, and including high-level explanations before diving into technical details would improve readability. Furthermore, illustrative examples, especially for complex concepts like fingerprint sampling, would enhance understanding.
The performance evaluation is thorough, with well-defined metrics, but the experimental setup descriptions are lengthy. Summarizing hardware/software configurations in a table and clearly stating the rationale for the chosen setups would improve clarity. The results are comprehensive but could be better presented using summary tables or graphs highlighting key findings. Additionally, addressing potential limitations, such as scalability under high workloads or the impact of network failures, would strengthen the evaluation.
The paper lacks a dedicated security analysis section, which is critical for a framework focused on secure data dissemination. A formal discussion on potential attack vectors, such as data tampering and DDoS attacks, and how the framework mitigates them, is necessary. Including a trade-off analysis between security and performance (e.g., encryption overhead) would also add value.
Lastly, discussing the framework's adaptability to other domains like smart cities or healthcare would further highlight its practical relevance.
Author Response
- The abstract is informative but could be more concise, focusing on the problem addressed, key contributions, and results without overloading it with technical terminology.
Answer: Agree. We have corrected the abstract as recommended. (Lines 4-6, 11-27). - The introduction effectively identifies challenges in Federated IoT environments but lacks a clear and strong problem statement. It would benefit from explicitly outlining the research gap and providing a focused motivation for the proposed solution, supported by real-world use cases.
Answer: We have reorganized the Introduction section and added real-world use case, which is a state border monitoring system based mainly on distributed multispectral sensors (IoT devices) and in which the five organizations that make up the federation participate (the Army, Police, Border Guard, Fire Department, and Emergency Medical Services). Line 1166 highlighted. - The framework design section is detailed but dense, with technical explanations that may overwhelm readers unfamiliar with the topic. Simplifying the diagrams, adding legends, and including high-level explanations before diving into technical details would improve readability. Furthermore, illustrative examples, especially for complex concepts like fingerprint sampling, would enhance understanding.
Answer: High-level explanation of our framework at the beginning of Section 3. has been included to enhance clarity prior to delving into the technical specifics. Lines 253-254, highlighted. - The performance evaluation is thorough, with well-defined metrics, but the experimental setup descriptions are lengthy. Summarizing hardware/software configurations in a table and clearly stating the rationale for the chosen setups would improve clarity. The results are comprehensive but could be better presented using summary tables or graphs highlighting key findings.
Answer: To enhance clarity, we incorporated tables that summarize the hardware and software configurations for Setup I (Section 5.1) and Setup II (Section 5.2). Additionally, in Section 5.3, which covers the benchmarking of the Data Dissemination System, we outlined the configuration parameters applied across all scenarios. Furthermore, the results and key findings presented in Section 6 (Discussion) have been organized into a summary table. Table 2, 3, 4, 20, 21 highlighted. - Additionally, addressing potential limitations, such as scalability under high workloads or the impact of network failures, would strengthen the evaluation.
Answer: We agree with this conclusion. We are aware of dynamic nature of real IoT devices. Moreover, entity mobility requirement was included in the architecture of our pluggable framework. The aforementioned allows to conduct experiments aimed at maximizing throughput (scalability) by applying various configurations to the components defining each layer of our environment. Additionally, our future studies will undertake more in-depth studies focusing on framework sensitivity depending on initial parameters (configuration). Furthermore, we are going to incorporate tests involving Value of Information, Data Quality, and Context Dissemination, and the Streams Microservice Layer integration with resource orchestration and automation platforms, allowing for the microservice dynamic allocation based on varying customer Key Performance Indicators. - The paper lacks a dedicated security analysis section, which is critical for a framework focused on secure data dissemination. A formal discussion on potential attack vectors, such as data tampering and DDoS attacks, and how the framework mitigates them, is necessary. Including a trade-off analysis between security and performance (e.g., encryption overhead) would also add value.
Answer: We had planned to present an extended security analysis in a separate paper, but we agree that a basic security assessment is essential. We added all Section V with a dedicated high-level security and reliability analysis considering a diverse range of threats across the Application, Network, and Perception layers of the IoT system was added to the article. Our future work will focus on establishing a formal multi-level security model that effectively addresses data confidentiality and integrity protection requirements within our framework. New section was added (highlighted) - Security and reliability risk assessment. - Lastly, discussing the framework's adaptability to other domains like smart cities or healthcare would further highlight its practical relevance.
Answer: We have added a real use case - monitoring and protecting the national border with the participation of 5 units of different organizations forming a federation. Line 1166 highlighted.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsIn brief, a very interesting article, but the links to IoT may need further clarification
Verdict: accept with minor modifications
The article starts with the currently fashionable ‘data and Iot soup’
However, that does not do right to the rest of the article.
Instead, it seems clear from the list of sources / application domains that we are not talking about just IoT, but instead about a collection of information streams originating from / to be classified as not only IoT, but instead a combination of IoT, M2M (machine to machine communications), and a variety of data sets and streams.
For example, one should be hesitant if not question the use of the term IoT for data collected by a drone (a combination of videostreams, recorded processed data, fusioned sensor data) or (‘intelligent’) healthcare.
Similarly, the idea that any data is just data is at least misleading; data should always be seen as specific, that has value within its domain and only in few cases as data that can be used for other purposes: not this mistaken idea that any data can be used for just anything . .
It is suspected that comments as given above are in particular valid for the military domain, that mixes in other domains like weather, transportation, even health /medical.
The article rightfully leaves quite a few questions for further research: that is considered a strong and honest scientific point of the paper.
Author Response
- The article starts with the currently fashionable ‘data and IoT soup’. However, that does not do right to the rest of the article. Instead, it seems clear from the list of sources / application domains that we are not talking about just IoT, but instead about a collection of information streams originating from / to be classified as not only IoT, but instead a combination of IoT, M2M (machine to machine communications), and a variety of data sets and streams. For example, one should be hesitant if not question the use of the term IoT for data collected by a drone (a combination of videostreams, recorded processed data, fusioned sensor data) or (‘intelligent’) healthcare.
Answer: We agree with this comment. We are not just talking about IoT, but rather the entire IoT ecosystem as a system of systems, i.e., a collection of IoT devices interacting with other objects and systems such as Cyber-Physical Systems, M2M and C2 systems (powered by streams of information from various devices not just IoT). Section 1 was modified. - Similarly, the idea that any data is just data is at least misleading; data should always be seen as specific, that has value within its domain and only in few cases as data that can be used for other purposes: not this mistaken idea that any data can be used for just anything . It is suspected that comments as given above are in particular valid for the military domain, that mixes in other domains like weather, transportation, even health /medical.
Answer: Yes, we agree. In the revised version of the paper, we have added another of the possible use cases for a data distribution system running in a federated IoT ecosystem environment, which involves a national border monitoring system. (Line 1166 highlighted)
- The article rightfully leaves quite a few questions for further research: that is considered a strong and honest scientific point of the paper.
Answer: We agree with this statement. We planned to thoroughly evaluate security and reliability risk assessments, considering a diverse range of threats across the Application, Network, and Perception layers of the IoT system. We aim to establish a formal multi-level security model that effectively addresses data confidentiality and integrity protection requirements. In addition, during our future research we will conduct experiments aimed at maximizing throughput by applying various configurations to the components defining each layer of our environment. We will also undertake more in-depth studies focusing on the Value of Information, Data Quality, and Context Dissemination. Additionally, we plan to deploy lightweight and delay-tolerant consensus algorithms within the Distributed Ledger Layer, which employs a directed-acyclic graph structure for achieving consensus.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper identifies key problem areas in federated IoT systems, such as data integrity, identity management, real-time processing, and resource allocation. However, it would benefit from more statistical depth, broader test coverage, and clearer explanations of results.
- The introduction delays identifying the research gap. The question of securely and reliably disseminating data in federated IoT is obscured by details until after paragraph 10.
- Claims of solving several "gaps" (lines 103–104) are not qualified with how extensively they are solved or what limitations remain.
- The contributions (lines 114–126) are listed late, and the numbering would be clearer if more explicitly introduced as "main contributions."
- The use of Kafka, Hyperledger, and microservices is presented as a given. There's no justification for why these technologies were chosen over alternatives.
- The framework includes many layers, potentially adding unnecessary complexity, such as the Fingerprint Enrichment Layer and Identity Streams Microservice, risking overengineering without clear justification for each component.
- The paper fails to clearly justify the selection of technologies like Kafka and Hyperledger Fabric over alternatives, such as MQTT for IoT or other suitable DLTs for IoT.
- The authors state that the data is encrypted and authenticated but lacks robust protection at the Data Queue Layer, relying solely on encryption. This approach assumes pre-shared keys are always secure, which may not be true in dynamic environments.
- Using one-time pre-shared keys requires a secure distribution mechanism, but none is provided. Without it, the verification framework fails.
- The evaluation uses static producers and consumers, overlooking the dynamic nature of real IoT devices. In practice, nodes may frequently join or leave networks; thus, handling mobility and churn is crucial. Ignoring these dynamics compromises the results' credibility.
- The authors vary consumer numbers but lack stress tests with many producers and consumers, leaving system performance under real-world stress.
- The paper mentions the "burst-at-startup technique" but doesn't explain how well it simulates less bursty, unpredictable, or continuous real-world data flows.
- Reported latencies differ greatly between Java and Go microservices, potentially causing confusion. Clarification or explanation of these inconsistencies may be necessary.
- The focus on average latency and percentiles lacks deep statistical analysis of variability or confidence intervals. A more detailed analysis would clarify the results' stability and reliability.
- The study's focus on two microservices (Java and Go) may bias the results. Testing more configurations or third-party frameworks could enhance the methodology and allow for broader performance comparisons.
- The study lacks sensitivity testing, hindering the prediction of performance changes when parameters like message size, network bandwidth, or environmental conditions vary.
Minor grammar issues, stylistic redundancies, and awkward phrasings
Author Response
- The introduction delays identifying the research gap. The question of securely and reliably disseminating data in federated IoT is obscured by details until after paragraph 10.
Answer: The structure of Section 1 has been revised. The problem statement and research gaps are now presented at the beginning of the section. - Claims of solving several "gaps" (lines 103–104) are not qualified with how extensively they are solved or what limitations remain.
Answer: The structure of Section 1 has been revised. The misleading phrase "solving several gaps" has been removed. - The contributions (lines 114–126) are listed late, and the numbering would be clearer if more explicitly introduced as "main contributions."
Answer: The structure of Section 1 was modified. Numbering has been introduced as “main contributions”. - The use of Kafka, Hyperledger, and microservices is presented as a given. There's no justification for why these technologies were chosen over alternatives.
Answer: Section 1 has been enhanced with additional paragraphs that present the rationale for integrating Apache Kafka, Kafka Streams API, and Hyperledger Fabric DLT into our framework. Furthermore, Sections 2 and 3 provide a more detailed exploration of these technologies, along with justifications for choosing them over alternative options. Lines 96 – 110. - The framework includes many layers, potentially adding unnecessary complexity, such as the Fingerprint Enrichment Layer and Identity Streams Microservice, risking overengineering without clear justification for each component.
Answer: Our framework features a multilayered (pluggable) architecture that allows for the replacement of components (technologies) within the defined layers. The Fingerprint Enrichment Layer, in addition to supporting device behavior-based authentication—which will be a focus of future research—facilitates the handling of connection-less protocols such as UDP. This is achieved through a Protocol Forwarder that converts data streams into a connection-oriented format like TCP. This layer is crucial due to the limitations of the connection types supported by the technologies present in the Data Queue Layer, particularly since Apache Kafka operates with TCP protocols. - The paper fails to clearly justify the selection of technologies like Kafka and Hyperledger Fabric over alternatives, such as MQTT for IoT or other suitable DLTs for IoT.
Answer: In Section 1, we included paragraphs that outline the rationale for integrating Apache Kafka, the Kafka Streams API, and Hyperledger Fabric DLT with our framework. Furthermore, Sections 2 and 3 provide a more detailed description of these technologies, along with justifications for choosing them over alternative options. Lines 96 – 110. - The authors state that the data is encrypted and authenticated but lacks robust protection at the Data Queue Layer, relying solely on encryption. This approach assumes pre-shared keys are always secure, which may not be true in dynamic environments.
Answer: Our work differentiates between the keys used for securing the communication channels of IoT devices and those used for ensuring data authenticity. The Data Queue Layer utilizes a serialization/deserialization process, allowing data streams to be format- and protocol-independent. The producer, through a data serialization mechanism, is solely responsible for converting data from a specific protocol into a byte representation. Conversely, the consumer determines how to interpret the received byte string from the broker via the deserialization process. This setup enables independent data exchange between producers and subscribers. It’s important to note that microservices do not need access to the data payload, as it may be encrypted using an encryption key established in advance between producers and subscribers (within a specific application, such as weather data). We propose utilizing pre-shared keys as an additional security layer to protect the Entity Features and Secure Seal Stores during the transit phase of the registration process, which is managed within the secure enclave of a given organization (Section 3.8). Within this process, we assume that a Supply Chain is monitored by the organization's administrators who oversee the registration process. This approach is designed to align with a Confidential Computing strategy that integrates Defense-in-Depth and Hardware Root of Trust concepts. New section was added (highlighted) - Security and reliability risk assessment. - Using one-time pre-shared keys requires a secure distribution mechanism, but none is provided. Without it, the verification framework fails.
Answer: We propose the utilization of pre-shared keys as an additional layer of security to safeguard Entity Features and Secure Seal Stores during the transit phase that occurs in the registration process. The Simple Key Loader, an advanced secure cryptographic device, can be utilized to ensure safe distribution and storage in conjunction with the IoT device's Hardware Security Module when ingesting the pre-shared key. - The evaluation uses static producers and consumers, overlooking the dynamic nature of real IoT devices. In practice, nodes may frequently join or leave networks; thus, handling mobility and churn is crucial. Ignoring these dynamics compromises the results' credibility.
Answer: Agree. We recognize the dynamic nature of real IoT devices, which is why we incorporated entity mobility requirements into the architecture of our pluggable framework. Our design focuses on enhancing interoperability, enabling seamless transfer of devices between the Publishers and Subscribers Layers across various organizations. This is achieved through the use of a Distributed Ledger Layer and multiple organizational Data Queue Layers, while simultaneously ensuring secure and reliable data dissemination in any configuration that adheres to predefined policies. This setup allows us to conduct experiments aimed at maximizing throughput by applying different configurations to the components that define each layer of our environment. - The authors vary consumer numbers but lack stress tests with many producers and consumers, leaving system performance under real-world stress.
Answer: In our article, we presented studies that examined key internal operations of the microservice, during which we collected latency metrics while benchmarking a single microservice deployed on various platforms, including a Standalone Laptop, Raspberry Pi 3, and Raspberry Pi 5. We agree that conducting stress tests with multiple producers and consumers will provide valuable insights to accurately evaluate our framework, and this will be addressed in future research. - The paper mentions the "burst-at-startup technique" but doesn't explain how well it simulates less bursty, unpredictable, or continuous real-world data flows.
Answer: In Setup II, Scenario III, we monitored power consumption, where continuous data stream was generated with a throughput of 400 messages per second. Future research endeavors will focus on varying configurations, specifically adjusting the number of producers and consumers, while also exploring less bursty and more unpredictable scenarios to better understand their impact on system performance. - Reported latencies differ greatly between Java and Go microservices, potentially causing confusion. Clarification or explanation of these inconsistencies may be necessary.
Answer: The latencies associated with key internal operations vary between Java and Go microservices, primarily due to their execution environments. The Go microservice is statically compiled into machine code, resulting in a single executable that encompasses all necessary dependencies and can be executed directly on the operating system. This allows for native handling of network interfaces and TLS/SSL connections. In contrast, Java employs a two-step compilation process, characterized by the principle of "write once, run anywhere," utilizing the Java Virtual Machine and Just-In-Time (JIT) dynamic compilation. This comparison is detailed in Section 5.3 - Microservice Benchmarking. - The focus on average latency and percentiles lacks deep statistical analysis of variability or confidence intervals. A more detailed analysis would clarify the results' stability and reliability.
Answer: A confidence intervals metric has been added to the analysis. Tables with metrics were modified (highlighted). Lines 962, 966, 974, 1025, 1047 (highlighted). - The study's focus on two microservices (Java and Go) may bias the results. Testing more configurations or third-party frameworks could enhance the methodology and allow for broader performance comparisons.
Answer: We concur with this statement. Importantly, our multi-layered pluggable architecture enables benchmarking across various configurations and third-party components (technologies), while concurrently minimizing operational deployment costs. Future research will concentrate on testing lightweight and delay-tolerant consensus algorithms within the Distributed Ledger Layer, such as the directed-acyclic graph structure. - The study lacks sensitivity testing, hindering the prediction of performance changes when parameters like message size, network bandwidth, or environmental conditions vary.
Answer: We agree with this conclusion. Our future studies will delve deeper into the framework’s sensitivity related to initial parameters (configuration). Additionally, we plan to incorporate tests that involve the Value of Information, Data Quality, and Context Dissemination, as well as the integration of the Streams Microservice Layer with resource orchestration and automation platforms. This will facilitate dynamic allocation based on varying customer Key Performance Indicators. We have added a real use case - monitoring and protecting the national border with the participation of 5 units of different organizations forming a federation. Line 1166 highlighted.
Comments on the Quality of English Language: Minor grammar issues, stylistic redundancies, and awkward phrasings.
Answer: We have done a thorough review of the article in terms of language correction.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors addressed all my concern.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have improved their work by properly addressing each comment. As a result, the paper has been significantly improved.