Next Article in Journal
Seven Properties of Self-Organization in the Human Brain
Next Article in Special Issue
Big Data Research in Fighting COVID-19: Contributions and Techniques
Previous Article in Journal
Artificial Intelligence-Enhanced Predictive Insights for Advancing Financial Inclusion: A Human-Centric AI-Thinking Approach
 
 
Article
Peer-Review Record

A Dynamic Intelligent Policies Analysis Mechanism for Personal Data Processing in the IoT Ecosystem

Big Data Cogn. Comput. 2020, 4(2), 9; https://doi.org/10.3390/bdcc4020009
by Konstantinos Demertzis 1,*, Konstantinos Rantos 1 and George Drosatos 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Big Data Cogn. Comput. 2020, 4(2), 9; https://doi.org/10.3390/bdcc4020009
Submission received: 21 March 2020 / Revised: 15 April 2020 / Accepted: 20 April 2020 / Published: 27 April 2020
(This article belongs to the Special Issue Advanced Data Mining Techniques for IoT and Big Data)

Round 1

Reviewer 1 Report

In this paper the authors introduce a complex system able to dynamically analyze the intents of a service provider in relation to the personal data processing of a given IoT device.

Results show that the proposed framework may assist the end-user in making an optimal decision when he connects a personal device to an IoT remote system or, more in general, to a cloud-based service. This approach thus protects users from undesired profiling and enhance their privacy.

The core sections of the paper are well written and include several technical details. The scientific methods and strategies are widely discussed as well.

On the contrary, I suggest improving the introductory part of the paper. Specifically, within the introduction, the authors refer to the "inability to understand the means and techniques used for collecting their personal data, how these data are being used and, most importantly, they are unaware of ways to take active measures to effectively protect their privacy". This concept is extremely relevant and should be better explained and discussed. In the recent past, several researchers pointed out that users are usually unaware of profiling and tracking activities, which may be carried out by malicious stakeholders even when the user relies on (seemingly) anonymous procedures ("Passengers information in public transport and privacy: Can anonymous tickets prevent tracking?" Avoine et al. 2014).  Another key aspect involving privacy is that, often, data features multiple individuals or, more specifically, data that seems to involve only certain individuals, in fact, reveal information about others ("A Survey on Interdependent Privacy" Humbert et al. 2019).

I also believe an important aspect of IoT privacy (information routing within the IoT network) is not considered in this paper. Recent literature points out how this side-channel information can indeed be used to extract some of the sensitive data the paper tries to protect ("Inferring personal information from demand-response systems" M. A. Lisovich et al. 2010). It would be interesting to add a (brief) discussion on this aspect in the conclusion, perhaps pointing at the leading current IoT privacy-preserving routing schemes in the literature (such as "Private inter-network routing for wireless sensor networks and the Internet of Things" Palmieri et al. 2017) as a potential solution.

I suggest to accept this paper after minor revision.

Author Response

Dear Respected Reviewer

We would like to thank you for reviewing our manuscript and for the positive and helpful comments regarding our manuscript. We have revised the manuscript taking into account all the comments to improve the readability of the research paper. We believe these changes have strengthened the rationale and importance of our study.

Cordially

Konstantinos Demertzis, Konstantinos Rantos and George Drosatos

Reviewer 1

In this paper, the authors introduce a complex system able to dynamically analyze the intents of a service provider in relation to the personal data processing of a given IoT device. Results show that the proposed framework may assist the end-user in making an optimal decision when he connects a personal device to an IoT remote system or, more in general, to a cloud-based service. This approach thus protects users from undesired profiling and enhance their privacy. The core sections of the paper are well written and include several technical details. The scientific methods and strategies are widely discussed as well.

Thank you for the remarks and for the careful reading.

On the contrary, I suggest improving the introductory part of the paper. Specifically, within the introduction, the authors refer to the "inability to understand the means and techniques used for collecting their personal data, how these data are being used and, most importantly, they are unaware of ways to take active measures to effectively protect their privacy". This concept is extremely relevant and should be better explained and discussed. In the recent past, several researchers pointed out that users are usually unaware of profiling and tracking activities, which may be carried out by malicious stakeholders even when the user relies on (seemingly) anonymous procedures ("Passengers information in public transport and privacy: Can anonymous tickets prevent tracking?" Avoine et al. 2014).  Another key aspect involving privacy is that, often, data features multiple individuals or, more specifically, data that seems to involve only certain individuals, in fact, reveal information about others ("A Survey on Interdependent Privacy" Humbert et al. 2019).

Thank you for the remarks. We have rearranged the introduction section and now include a discussion of ways to take active measures to effectively protect their privacy.

I also believe an important aspect of IoT privacy (information routing within the IoT network) is not considered in this paper. Recent literature points out how this side-channel information can indeed be used to extract some of the sensitive data the paper tries to protect ("Inferring personal information from demand-response systems" M. A. Lisovich et al. 2010). It would be interesting to add a (brief) discussion on this aspect in the conclusion, perhaps pointing at the leading current IoT privacy-preserving routing schemes in the literature (such as "Private inter-network routing for wireless sensor networks and the Internet of Things" Palmieri et al. 2017) as a potential solution.

Thank you for this constructive comment. In the conclusions section, we added a brief discussion about the side-channel information according to the reviewer’s comment and suggestion.

I suggest accepting this paper after minor revision.

Thank you

 

Reviewer 2 Report

This paper describes a solution for the analysis of the configurations of personal data access options in IoT devices to determine if a third party may or may not be able to perform user profiling with the obtained data. The solution is integrated in a previous framework of the authors for the management of user consents. The solution is based on the use of a machine learning technique that is tested with an approach to simulate how users provide consents in home entertainment devices.

A first problem with this paper is in my opinion the lack of clarity in the presentation of the ideas. The paper does not provide in its introduction a clear synthesis of the motivation for the work based on related work analysis. It is not shown also in the related work section, which is the contribution of the paper with respect to already existing solutions. Not even clear which is the contribution of the present paper with respect to previous work of the authors. The two important techniques used in the work, namely Fuzzy Cognitive Maps (FCMs) and Extreme Learning Machines (ELMs) should be briefly described in a background section with appropriate examples. The reason why they were chosen should also be very clear in that section. In general I miss many examples along the paper to help in understanding the explanations given. For example, why don´t you explain the semantic interpretation of some part of the graph of Figure 4. How do you interpret an arrow from unm to loh with a value of 0.15, etc. Which are the semantics of the arrows? (if one variable, node, is true, then the weight means what, the probability that other may be deduced?, the probability that the user has also given consent to another one?

Regarding soundness, section 4 presents a problem of feature selection, but the existing approach in ML for this problem are not even referenced. The descriptions given throughout the paper are not sound and formal. See for example two first paragraphs of section 5, where concepts like state, feature, concept, neuron and property are used without any definition or even previous description. If you are using FCMs to simulate the ground truth, how do you evaluate the approach. How do you know that you are generating a dataset that represents well this problem?

An important part of the material of the paper is devoted to the description of FCMs and ELMs. An important part of section 5 is devoted to providing complex mathematical background of FCMs and ELMs, which are not a contribution of the authors. Section 5 should concentrate on the description of the authors contributions. I would expect a paper more focused on how these two techniques where used to solve the problem. It seems that FCMs are used to simulate the ground truth and ELMs are used to solve the classification problem to do decision support. However, this is not very clearly explained. For example, it is not clear the definition of the two classes “Automated individual decision-making, including profiling”. How do you know (to construct the ground truth) that one case is in one class or the other? What do you mean by profiling? Does it not depend on a specific application?

Apart from the above comments, I found a sentence in the description of FCMs in page 6 (lines 151-153) that is identical to another sentence in the Wikipedia page of FCMs (https://en.wikipedia.org/wiki/Fuzzy_cognitive_map, third sentence in the Details section). Literal reuse of text in some definition may be done, but always with appropriate attribution to the source.

Author Response

Dear Respected Reviewer

We would like to thank you for reviewing our manuscript and for the positive and helpful comments regarding our manuscript. We have revised the manuscript taking into account all the comments to improve the readability of the research paper. We believe these changes have strengthened the rationale and importance of our study.

Cordially

Konstantinos Demertzis, Konstantinos Rantos and George Drosatos

Reviewer 2

This paper describes a solution for the analysis of the configurations of personal data access options in IoT devices to determine if a third party may or may not be able to perform user profiling with the obtained data. The solution is integrated in a previous framework of the authors for the management of user consents. The solution is based on the use of a machine learning technique that is tested with an approach to simulate how users provide consents in home entertainment devices.

Thank you for the remarks and for the careful reading.

A first problem with this paper is in my opinion the lack of clarity in the presentation of the ideas. The paper does not provide in its introduction a clear synthesis of the motivation for the work based on related work analysis. It is not shown also in the related work section, which is the contribution of the paper with respect to already existing solutions. Not even clear which is the contribution of the present paper with respect to previous work of the authors.

We would like to thank the reviewer for this constructive comment that gives us the chance to clarify things further. The introduction provides a good, generalized background of the topic that quickly gives the reader an appreciation of the wide range of privacy issues and also give the motivation of the proposed framework. However, to make the introduction more substantial, we added some references to substantiate the claim made in the first sentence (that is, provide references to other groups who do or have done research in this area).  In addition, we have rearranged the introduction section and now includes a clear synthesis of the motivation for the work based on related work analysis and of ways to take active measures to effectively protect their privacy with respect to already existing solutions. In our opinion, there is a serious gap in applied research and, more generally, in implementations or applications that could help the user obtain meaningful information on how to deal with personal data leakage incidents and how optimal decisions are made with regards to the protection of their personal data, as well as applications that will undertake a thorough analysis of the user's consent, in order to identify possible actions aimed at profiling. Also, we have rearranged section 2 "Related Work" in order to make clear how the current study is unique according to the reviewer’s comment and suggestion. Finally, our prior work in the privacy field summarized in section 3 "The ADVOCATE Framework".

The two important techniques used in the work, namely Fuzzy Cognitive Maps (FCMs) and Extreme Learning Machines (ELMs) should be briefly described in a background section with appropriate examples. The reason why they were chosen should also be very clear in that section. In general I miss many examples along the paper to help in understanding the explanations given. For example, why don´t you explain the semantic interpretation of some part of the graph of Figure 4. How do you interpret an arrow from unm to loh with a value of 0.15, etc. Which are the semantics of the arrows? (if one variable, node, is true, then the weight means what, the probability that other may be deduced?, the probability that the user has also given consent to another one?

The problem of recording user data from smart entertainment devices was modeled using FCMs. FCMs are an alternative method of modeling complex systems capable of describing the causal relationships among major concepts that determine the dynamic behavior of a system. In particular, they form a symbolic description and representation of the formation of a dynamic system. On the other hand, the identification of conflicting rules that can lead to profiling is done using ELMs. The exceptional characteristics of ELMs, such as efficiency, speed, and optimal generalization capabilities, have been shown to have a wide range of problems from different disciplines, with often comparable or even better results than those of classical machine learning or deep learning algorithms. Another important fact that enhances the use of ELMs in this specific privacy problem is that they work best when the input patterns are from the same boundary distribution or follow a common structure, as in the case we are examining. We have reduced the explanations about standard methods and techniques that can be directly extracted from the standard bibliography in order to include descriptions that really providing a scientific contribution. Also, the proposed architecture allows the neural network to learn from unknown data much more easily. The model learns and performs well on unseen data and it is capable to generalize. This is the most suggestive example that the method is robust against ML problems such as noise. In addition, an appropriate example of the use of the FCM included in section 2 "Related Work".

The written relationships of the map in Figure 4 outline the human experience and knowledge to solve the problem as a result of continuous observation and intervention. Specifically, after a thorough and detailed continuous study through trial and error, and after analogous thresholds have been set for the expert opinion, it has been realized that in each use scenario, if at least 8 of the parameters to be tested are recorded, we have profiling. Also, there is an increased chance of profiling, more than 95% of cases, when we record at least 3 parameters out of the following: Username, Location_History, IP, Device_Number, and Interests. The combination of the above 2 rules after being applied to a total of 4096 cases attributed 2134 classes Yes and 1962 No. These classes illustrate the problem of user profiling by recording data from smart entertainment devices and can be used for intelligent training standards or algorithms that will allow the immediate information of the user and optimal decision-making on the security of their personal data and their protection against profiling. We have the appropriate explanations about the influence of the parameters on the obtained results of the proposed model in sections 4 and 5. On the other hand, the exact descriptions of the causal relationships among major concepts that determine the dynamic behavior of the proposed system and expert knowledge for smart devices' behavior cannot be determined in this paper. Thank you for this constructive comment.

Regarding soundness, section 4 presents a problem of feature selection, but the existing approach in ML for this problem are not even referenced.

Thank you for the remarks. Feature Selection is a very critical component in a data scientist’s workflow. When presented data with very high dimensionality, models usually choke because training time increases exponentially with the number of features. Models have an increased risk of overfitting with an increasing number of features. Feature Selection methods help with these problems by reducing the dimensions without much loss of the total information. It also helps to make sense of the features and their importance. This study is based on a heuristic approach and it is a proof of concept in this specific privacy problem. In section 4 “Scenarios and Data”, we present a binary problem with a total of 4096 cases. We believe that the standard description of an explanation about the ML feature selection is out of the scope of this work.

The descriptions given throughout the paper are not sound and formal. See for example two first paragraphs of section 5, where concepts like state, feature, concept, neuron and property are used without any definition or even previous description. If you are using FCMs to simulate the ground truth, how do you evaluate the approach? How do you know that you are generating a dataset that represents well this problem?

The complex dynamical systems have nonlinear behavior and cannot simply be derived from the summation of analyzed individual component behavior. In the case of smart entertainment devices, conventional modeling and controlling methods have a limited contribution. The modeling of these systems requires methods that can utilize existing knowledge and human experience. The human experience and knowledge on the operation of the complex system are embedded in the structure of FCM and the FCM developing methodology, i.e. using human experts that have observed and known the operation of the system and its behavior under different circumstances. The objective of this research is to introduce a methodology for developing FCMs based on fuzzy logic theory, to investigate the advantages and potential use of FCM in modeling smart entertainment devices and to prove how appropriate FCMs are used to exploit the knowledge and experience of privacy experts on the description and modeling of the operation of a complex plant. The development of FCM is based on using words to describe worlds. FCM concepts like state, feature, concept, neuron, etc. represent knowledge and relates states, variables, events, inputs and outputs in a manner, which is analogous to that of human beings. This methodology could help us to construct sophisticated systems, as it is generally accepted that the more symbolic and fuzzy representation is used to model a system the more sophisticated the system is. It should be noted that there are exponentially multiple and different cases of variables that can be plotted as system parameters and which can give a different view of the problem. The methodology presented in this paper is indicative of a way of modeling the problem in question. However, the FCM concepts emerged after exhaustive testing (trial and error), taking into account the certainty, uncertainty, and risk of each scenario while focusing on its simplicity. Trial and error are characterized by repeated, varied attempts which are continued until success. In order to find the best solution by the proposed method, we evaluate each trial model based on the predefined set of criteria, the existence of which is a condition for the possibility of finding an optimal solution. This complicates the research problem and requires different constraints, weights or learning rates to generalize different data patterns, and additionally needs the inclusion of prior knowledge by specifying the distribution from which the data sample originated. So, the exact hyperparameters of the FCM model are impossible to present here. This is discussed in detail in section 5 “Methodology of IPAM based on FCM and ELM”. Thank you for this helpful comment.

An important part of the material of the paper is devoted to the description of FCMs and ELMs. An important part of section 5 is devoted to providing complex mathematical background of FCMs and ELMs, which are not a contribution of the authors. Section 5 should concentrate on the description of the author’s contributions. I would expect a paper more focused on how these two techniques where used to solve the problem. It seems that FCMs are used to simulate the ground truth and ELMs are used to solve the classification problem to do decision support. However, this is not very clearly explained. For example, it is not clear the definition of the two classes “Automated individual decision-making, including profiling”. How do you know (to construct the ground truth) that one case is in one class or the other? What do you mean by profiling? Does it not depend on a specific application?

Thank you for this useful comment. In order to implement the proposed IPAM, the problem of profiling from smart entertainment devices was simulated. After a thorough study of the operation of these devices, bibliographic review and analysis of their manuals, 12 variables, which can be collected by smart entertainment devices, were chosen to model the problem of profiling. This variable has been transformed into the following two rules for creating “profiling”: The class is Yes if there are at least 8 parameters recorded or the class is Yes if at least 3 parameters are recorded from the Username, Location_History, IP, Device_Number and Interests. Generally speaking, there are exponentially multiple cases of variables that can be modeled as the problem parameters that giving a different view of the discussed problem. The methodology presented in this paper is indicative of a way to handle the discussed problem with a heuristic approach. This is discussed in detail in section 5 “Methodology of IPAM based on FCM and ELM”.

Apart from the above comments, I found a sentence in the description of FCMs in page 6 (lines 151-153) that is identical to another sentence in the Wikipedia page of FCMs (https://en.wikipedia.org/wiki/Fuzzy_cognitive_map, third sentence in the Details section). Literal reuse of text in some definition may be done, but always with appropriate attribution to the source.

Thank you for this constructive comment. The appropriate reference added according to the reviewer’s suggestion.

 

Round 2

Reviewer 2 Report

The authors have done an effort to better explain the motivation of their work in the introduction, however, I still miss a clear statement of which is the objective of the paper. Just before the outline (last paragraph of the introduction), the authors should include a paragraph that describes the objective of the paper and highlights the main advantages of their approach.

The description of the main contribution, in Section 5 has been improved.

In page 8 the authors argue that: “The scenario of the 12 specific parameters with 2 possible states implements a model which is an abstract representation of the real system. This denotes that it implements only certain properties and characteristics of the real system, without taking into account all of them. The need for selection arises from the fact that real scenarios are extremely complex and cannot be fully represented” In their comments they also argue that feature selection is out of the scope of the paper, however, in my opinion it is clearly related to the above statement. In their argument, they claim that “This study is based on a heuristic approach and it is a proof of concept in this specific privacy problem”. Perhaps they should make this clear in the paper, and justify why they followed a heuristic approach instead of choosing a more exhaustive feature selection approach.

Author Response

Dear Respected Reviewer

Thank you very much for giving us an opportunity to revise our paper. We are grateful to you for your positive and constructive comments and suggestions on our paper. Those comments are all valuable and very helpful for revising and improving our manuscript. We have carefully revised the paper by following your comments.

Cordially

Konstantinos Demertzis, Konstantinos Rantos and George Drosatos

Reviewer 2

The authors have done an effort to better explain the motivation of their work in the introduction, however, I still miss a clear statement of which is the objective of the paper. Just before the outline (last paragraph of the introduction), the authors should include a paragraph that describes the objective of the paper and highlights the main advantages of their approach.

Thank you for this constructive comment. We have added in the introduction a paragraph that describes the objective of the paper and highlights the main advantages of their approach according to the reviewer’s comment and suggestion. Specifically, “This paper presents the Intelligent Policies Analysis Mechanism (IPAM) of the ADVOCATE framework [12–14], which, in an intelligent and fully automated manner, can identify conflicting rules of the user’s consents which may lead to the collection of personal data and, consequently, be used for profiling. The main goal of the proposed approach is the design of an intelligent decision-making system for protecting the privacy of average users. This is achieved by simulating the processes of smart entertainment devices that can collect personal data from the user's environment. The IPAM is an innovative, reliable, low-demand and highly effective system based on sophisticated computational intelligent methods that deliver high-precision results, capable of responding to the problem, as well as in cases of similar complex situations.”.

The description of the main contribution, in Section 5 has been improved.

Thank you for the remarks and for the careful reading.

In page 8 the authors argue that: “The scenario of the 12 specific parameters with 2 possible states implements a model which is an abstract representation of the real system. This denotes that it implements only certain properties and characteristics of the real system, without taking into account all of them. The need for selection arises from the fact that real scenarios are extremely complex and cannot be fully represented” In their comments they also argue that feature selection is out of the scope of the paper, however, in my opinion it is clearly related to the above statement. In their argument, they claim that “This study is based on a heuristic approach and it is a proof of concept in this specific privacy problem”. Perhaps they should make this clear in the paper, and justify why they followed a heuristic approach instead of choosing a more exhaustive feature selection approach.

Thank you for this useful comment. In general, a heuristic technique is any approach to problem-solving that utilizes a practical method without guaranteed to be optimal but is adequate for the immediate goals. Our methodology aims to identify conflicting rules or consents of the user which may lead to the collection of personal data that can be using the above research design parameters in a certain way to serve this purpose. Specifically, it suggests some basic rules, which are specified as a method of modeling complex systems capable of describing the causal relationships among major concepts that determine the dynamic behavior of the smart entertainment systems. The proposed rules (as an alternative feature selection process) describe the causal relationships among the parameters of the problem we are examining. These causal relationships, which have arisen from existing knowledge and experience, considering the certainty, uncertainty, and risk of each scenario while focusing on its simplicity, illustrates the different aspects of the operation of smart entertainment devices. In order to capture this knowledge, a directed graph is created, with the nodes representing the variables of the problem. An integral part of this methodology is the process of verifying it with tests of validity, reliability, and range of findings. The main purpose of this is to predict the behavior of the system under the given recorded conditions, or not, of the system parameters. This is discussed in detail in the section 4. “Scenarios and Data” according to the reviewer’s suggestion.

Back to TopTop