Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Anonymous Methods Based on Multi-Attribute Clustering and Generalization Constraints

Electronics 2023, 12(8), 1897; https://doi.org/10.3390/electronics12081897

by Yunhui Fan, Xiangbo Shi, Shuiqiang Zhang and Yala Tong^*

Reviewer 1:

Muhammad Adil

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Electronics 2023, 12(8), 1897; https://doi.org/10.3390/electronics12081897

Submission received: 14 March 2023 / Revised: 14 April 2023 / Accepted: 14 April 2023 / Published: 17 April 2023

(This article belongs to the Section Networks)

Round 1

Reviewer 1 Report

In the first line of abstract, abbreviation IoT is not defined. Please defined every abbreviation and notation at first use

I suggest the authors to add a table for all abbreviations in the revised paper.

The proposed model is empty. Please add more information, how this framework works? Step-wise

Can you mention any particular application of IoT, where the proposed model scheme should be applied?

What are limitations of this scheme, as every model has some disadvantages.

I can’t see any comparative work. Can you respond this question, why this work is needed in presence of other paper?

Author Response

Response to Reviewer 1 Comments

Point 1: In the first line of abstract, abbreviation IoT is not defined. Please defined every abbreviation and notation at first use

Response 1: Each abbreviation and symbol is defined at first use;

Point 2: I suggest the authors to add a table for all abbreviations in the revised paper.

Response 2:

Abbreviation	Definition
IoT	Internet of Things
S attributes	Sensitive attributes
l-diversity	L-diversity
QI	quasi-identifier
KNN	K-nearest neighbor
MCKL	Multi-attribute clustering and generalization constrained (k,l)-anonymity algorithm
CKA	cluster-based k-anonymity algorithm
CKL	cluster-l-diversity-based k-anonymity algorithm

Point 3: The proposed model is empty. Please add more information, how this framework works? Step-wise

Response 3:The algorithm performed width-first sorting of the attributes in the multidimensional data table using a greedy strategy, selected the attribute with the largest width value as the division dimension, started the division from the largest attribute width value, and repeated the steps recursively for the remaining subspaces until all subspaces were not divisible, to obtain the generalization hierarchy of attributes. In the process of dividing the equivalence classes, the distance metric between attributes was combined with the improved KNN clustering to divide the equivalence classes, relying on the width-first result to determine the cluster centers, finding the (k-1) tuple records with the closest distance to the initial cluster centers, so that different division clusters were grouped into the same equivalence classes, and then using the above generalization hierarchy to take a hierarchical generalization for each cluster group to avoid significant information loss and improve data usability. At the same time, to solve the problem of insufficient diversity in the anonymity model, when dividing the equivalence classes, frequency-diversity constraints were applied to S attributes according to the frequency criterion, and a boundary system was set for sensitive values. The frequency of S attributes was not greater than f (f=s/l, where s was the number of S attributes in the equivalence class), so there was a similarity in the values of S attributes within the group and differences in the importance of S attributes between the groups.(Lines 266-283).

Point 4 :Can you mention any particular application of IoT, where the proposed model scheme should be applied?

Response 4: Application scenario: Data protection in data sheet publishing and sharing in IoT applications.

Point 5: What are limitations of this scheme, as every model has some disadvantages.

Response 5: The limitation of this solution is that it has not yet achieved better results in terms of running time.

Point 6:I can’t see any comparative work. Can you respond this question, why this work is needed in presence of other paper?

Response 6: The experimental comparison focused on the multi-attribute clustering and generalization constrained (k,l)-anonymity (MCKL) algorithm with the cluster-based k-anonymity algorithm (CKA) and the cluster-l-diversity-based k-anonymity (CKL) algorithm for experimental comparison and analysis. And the aim is to reduce the loss of information in anonymity and to strengthen the constraints on attributes.

(Lines 303-314).

Reviewer 2 Report

The privacy protection is one of very important issues in IoT applications.

This paper designed an anonymization method based on multi‐attribute clustering and generalization constraints (MCKL) and validated this mechanism through simulations with qualified ML dataset.

When comparing with CKA and CKL algorithms, this paper showed that proposed MCKL algorithm can effectively avoid large runtime and reduce information loss.

This paper showed concrete numerical steps of ‘Distance Metric’ and ‘Accuracy Measurements’ to prove reliability of MCKL generalization structure.

Introduction and references are written sufficiently to understand previous research trend and current issues in the privacy protection of IoT.

This paper seems to have no logical error and showed concrete research procedure and results, when explaining the numerical modeling process of the new MCKL mechanism, its simulation experiment and performance metrics.

Author Response

With regard to the English language, the English editing service under mdpi has been taken

Reviewer 3 Report

The paper titled “Anonymous methods based on multi‐attribute clustering and 2 generalization constraints” explored the k‐anonymization algorithm and discussed its shortcomings such as over‐generalization and insufficient attribute diversity constraints. Based on the points, authors proposes a multi‐attribute clustering constrained (k, l)−anonymization method that can be applied to multi‐dimensional data. Their algorithm first determines the generalized attributes using width‐first and constructs a generalization hierarchy. It then hierarchically divides the equivalence classes using a knn clustering strategy based on the distance metric to generalize the attributes. . Experiments show that the algorithm reduces the information loss in the anonymization process and improves data availability.

Authors claim that this work results in protection of multidimensional data tables, which can

enhance the ability to resist background knowledge attacks and homogeneity attacks? Detailed discussion is need to support this aspect of the technique.

It is hard to understand creation of Table 2 from Table 1. You can explain these steps. OR if datasheet in Table 2 is obtained from another sources, discuss it. Can you find an alternative dataset?

Its better to write KNN instead of knn.

Write descriptive captions. See Figure 2. Algorithm flowchart does not convey any meaningful information.

Paper needs rewriting as the contribution seems hard to understand in current form. Try to add more aspects and see if you can add appropriate constraints. Some related papers that can help are: Comparison and analysis of greedy energy-efficient scheduling algorithms for computational grids, and Power efficient rate monotonic scheduling for multi-core systems.

Typos:

The data table 1, shown in the figure below, consists of column attributes and row 88 tuples, its suggested to write Table 1 for tables and figures at all applicable places in the paper.

Author Response

Point 1 &5: Authors claim that this work results in protection of multidimensional data tables, which can enhance the ability to resist background knowledge attacks and homogeneity attacks? Detailed discussion is need to support this aspect of the technique.

Response 1 &5: The lack of practical restriction constraints on the S attributes and simply controlling the number of values assigned to the attributes led to the problem of insufficient diversity constraints in the (k,l)-anonymity model. The S attributes were more concentrated and could not effectively resist background knowledge and homogeneity attacks, even at a leakage rate of 1/k, which reduced the quality of the data after anonymization and reduced usability. This paper used an improved frequency-diversity constraint to address the lack of diversity constraints in the anonymization model. As compared to the essential l-diversity constraint, which only controled the number of the values of the S attributes, the improved frequency-diversity constraint ensured that the equivalence class contained at least l S attributes while restricting the rules on S attributes according to the frequency criterion by selecting l records with different S attributes and a minimum distance to each other in order to construct an equivalence class. This ensured the records belonged to different classes, and the frequency of S attributes would not be greater than f (f=s/l, where s is the number of S attributes in the equivalence class); therefore, the values of S attributes remained similar within groups and different between groups. This method could resist background knowledge and homogeneity attacks, reduce the risk of S attribute leakage, and improve privacy protection and data availability. (Lines 244-260.)

Point 2: It is hard to understand creation of Table 2 from Table 1.You can explain these steps. OR if datasheet in Table 2 is obtained from another sources, discuss it. Can you find an alternative dataset?

Response 2: Table 2 is a (3,2)-anonymous table, obtained from Table 1 by passing through the (k,l)-anonymity of k=3,l=2.

Point 3: Its better to write KNN instead of knn.

Response 3: Complete the changes in the text to address this point.

Point 4: Write descriptive captions. See Figure 2. Algorithm flowchart does not convey any meaningful information.

Response4: The title of the text has been changed to a descriptive captions.

Point 6:Typos

Response 6: data table 1 modified to Table 1.

Round 2

Reviewer 1 Report

Thanks for responding to my comments. At this point, I do not have further suggestions to the authors.

Author Response

Dear Reviewer：The paper was checked for English language and style.

Reviewer 3 Report

My comments have been incorporated properly in the revised draft.

Literature survey can by improved with related works: Cost efficient resource allocation for real-time tasks in embedded systems,Comparison and analysis of greedy energy-efficient scheduling algorithms for computational grids

Author Response

Dear Reviewer：

Point 1: Literature survey can by improved with related works: Cost efficient resource allocation for real-time tasks in embedded systems,Comparison and analysis of greedy energy-efficient scheduling algorithms for computational grids.

Response 1: Resource allocation problem in real-time systems was NP-hard, especially when these systems were deployed in cloud computing environments where task execution involved deadline constraints. Scholars proposed a hybrid form of cuckoo search and genetic algorithm, called HGCS (hybrid genetic and cuckoo search), which added genetic operators to the cuckoo search algorithm, led to a rigorous pursuit of the solution space to find the best feasible plan that could execute the task in the shortest possible time, thus reduced the total resource usage cost. As technology advances to enable collaboration and resource sharing through software, but the size and energy consumption of computational grids continues to increase, scholars have proposed that heuristics be used to arrange the tasks of computational grids, and greedy heuristics were used to find energy-conscious solutions.

The relevant literature is cited in lines 83-93 of the paper.

Article Menu

Anonymous Methods Based on Multi-Attribute Clustering and Generalization Constraints

Further Information

Guidelines

MDPI Initiatives

Follow MDPI