Next Article in Journal
Theoretical Analysis of Deformation and Internal Forces of Used Piles Due to New Static-Pressure Pile Penetration
Next Article in Special Issue
Qualitative and Quantitative Evaluation of Multivariate Time-Series Synthetic Data Generated Using MTS-TGAN: A Novel Approach
Previous Article in Journal
Developing a New Filtered-X Recursive Least Squares Adaptive Algorithm Based on a Robust Objective Function for Impulsive Active Noise Control Systems
Previous Article in Special Issue
FocalMatch: Mitigating Class Imbalance of Pseudo Labels in Semi-Supervised Learning
 
 
Article
Peer-Review Record

Distributed Online Multi-Label Learning with Privacy Protection in Internet of Things

Appl. Sci. 2023, 13(4), 2713; https://doi.org/10.3390/app13042713
by Fan Huang , Nan Yang, Huaming Chen , Wei Bao and Dong Yuan *
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Appl. Sci. 2023, 13(4), 2713; https://doi.org/10.3390/app13042713
Submission received: 10 January 2023 / Revised: 12 February 2023 / Accepted: 15 February 2023 / Published: 20 February 2023
(This article belongs to the Special Issue Big Data Security and Privacy in Internet of Things)

Round 1

Reviewer 1 Report

Reviewer #:

In the case study, the researchers proposed novel multi-label learning to remove the step of sharing data sources between distributed IoT nodes. This work addressed Online-training nodes of IoT-based applications by providing a multi-label classification algorithm. The research followed an efficient purpose, which can be improved the performance of the Internet of Things. Nevertheless, I have some concerns about the idea and manuscript as follows:

In abstract

(1) Please, describe all abbreviations before employing them in the manuscript (such as Internet of Things (IoT)).

(2) The abstract concepts had not successfully presented the innovation of the idea in comparing similar works.

(3) The authors pointed out the security challenges of the previous study in sharing data sources between distributed nodes; but, they had not presented the feature of the study to overcome the problem.

(4) The authors only presented the methods of the study while their achievements were invisible.

(5) I recommend preparing the quantitative results to insert in the abstract text for proving the efficient role of the study in comparing similar works.

(6) The structure of some sentences was not appropriate to convey concepts such as "In our proposed method, each computing node trains a model online with its local online data and uses a novel discrete-time update algorithm to learn from other nodes’ training results.".

In introduction

(7) All employed abbreviations have to describe as "RSS" (RFD Site Summary (RSS)).

(8) The authors referred to the old references for proving the efficiency of the idea. The issue is not proper to start an introduction of a research and original study to problem definition and point out the challenge of the previous studies.

(9) The authors kept repeated "online multi-label classification" in short paragraphs of the section, which can deteriorate the quality of the text and boring for the researchers.

(10) A huge number of the case studies addressed reinforcement learning and Online-training to support timely decision-making in IoT besides employing unsupervised learning. Please, present a strong reason to prove the efficiency of multi-label learning to improve the performance of IoT-based applications.

(11) Please, introduce the organization of the remaining sections of the manuscript.

In related work

(12) I faced some structural problems with the sentences such as "As the name indicates, problem transformation methods are generally concerned with converting a multi-label classification problem into many single-label classification problems.". The authors utilized "problem" in the sentence several times. Please, re-organize the sentence and similar cases.

(13) The authors referred to old references of multi-label classification. The issue created an unfair analysis to prove the efficiency of the proposed idea. I recommend analyzing recent studies in the research area to demonstrate the superiority of the study in comparing them.

(14) Please, describe the abbreviations before utilizing them in the manuscript's text such as "OML", etc (authors described it in the next sections).

(15) The authors utilized the repeated words in the sequence sentences such as "On the other hand, the ensemble approach works by combining multiple weaker classifiers into one stronger classifier. The ensemble approach is more straightforward to scale and parallelize than a single approach.", etc, which the issue leads to a falling quality of the manuscript.

(16) The authors mentioned the performance and its improvement in different sentences and sections of the manuscript. Please, clarify the purpose of the performance feature. The parameter covers different concepts such as energy efficiency, delay, latency, availability, security, inference time, and timely decision-making, etc.

(17) The main goal of providing a related work section of original research is to clarify the advantages and weaknesses of the related studies to prove the efficiency of the proposed idea in facing their problems. I recommend adding a paragraph to describe the idea's efficiency in comparing the reviewed studies at the end of the related work section.

In problem definition

(18) The section is an important part of the introduction and almost describe as the subsection. The structure of the manuscript seems messy and leads to confusion for the researchers. I recommend providing it as a subsection of the introduction (introduction section) or changing the title of the section to the proposed idea and re-organize it to classify and support the next sections into subsections (in general recommending re-organize the section and the structure of the manuscript).

(19) Please, describe the ranges of all employed variables such as n, i, j, etc (as an example i>0, etc).

(20) The highlighted issue of the study: Have the authors any explanation and justification for the special scenario in the real world or expected events in the communication between devices of an IoT-based application? The authors pointed out the real-time applications and lack of sharing data sources between distributed nodes, and also communication between pair neighbors. What happens when node 1 is not available for a short time or in similar scenarios (Figure 1)?

In distributed Online multi-label classification algorithm

(21) Please, describe the ranges of all variables in the Equations.

(22) The Equations seem to follow the base model of Projection (linear algebra) to perform map operations. Please, refer to the main reference in 1979 to present the Equations if you followed the Formula. Otherwise, clarify the difference between the proposed Equations and the base model of projection.

(23) I think that the j parameter as the column of the Graph matrix has to start at zero (j=0) to determine the degree of a node in Equation "di". Please, recheck the issue and correct it in and related case.

(24) Please, define the m index and its range.

In our proposed algorithm

(25) In IoT applications and real-time tasks, we face many updated information in t time. Please, clarify the benchmark to determine the period of sending the updated information to neighbors. Also, I think that you will face the problems of energy efficiency and latency to continuously listen in hard traffic situations. Have any academic justification or results for the issue?

(26)  Please, utilize the integrated and standard structure to set the algorithm. I faced some inappropriate structures such as "for t=", "if instance", etc.

(27) I faced some grammatical and typo problems such as "Lines 5 through 6 indicate that the end device of

the current node fetched a set of pairwise instances (xt, yt) at time t and added them to the cache. After this, each node will calculate the corresponding Ei and Fi based on Equation (11) and (12), with i denoting the number of the current node.".

In experiment setup

(28) I think that precision has a highlighted role in Online multi-label classification in real-time applications. Usually, accuracy and precision can not demonstrate the high value for both parameters. Please, clarify the issue if you have another view about the problem.

In results

(29) I recommend adding an introduction to the section. It seems not suitable to employ two titles in tandem.

(30) I faced some inappropriate structure of the sentences such as "In DOML, the entire dataset will be equally distributed among the three nodes to test whether the global performance of DOML is comparable to that of the current well-performance online multi-label algorithms and batched multi-label algorithms under ideal conditions.", etc.

(31) The authors started some subsections (such as 7.1.1) with Tables and Figures, which seem not proper in the manuscript.

(32) Please, justify and explain the reported results. The proposed algorithm (DOML) had falling performance (Hamming loss) for some datasets (CAL500, COREL5k, SCENE, and YEAST). Why?

(33) You can only compare the results when facing the same connectivity of the nodes for the different algorithms. The authors only reported the results of OML for non-clear connectivity of OML (Tables 4-7).

(34) The reported results proved the weak efficiency of DOML when facing a rising number of nodes. The issue hurt to performance of IoT-based applications. Please, justify the problem if you have a reason (Tables 4-5).

 

In finally

(35) I recommend reviewing the recent studies about Online learning, training, and multi-label classification methods and proving the efficiency of the proposed idea in comparing the newer works in the research area.

(36) Please, revise the English language writing of the manuscript with high accuracy. I faced many typos and grammatical problems, which pointed out some of them.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

1. Here, ML algorithm runs in distributed nodes. Federated Learning approach will be the same approach. Justify how your approach differ from FL?

2. Here, only the accuracy is taken as benchmarking, what about execution time?

3. different Online Multi-Label Classification Algorithms can be tabulated in Literature survey.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

The article “Distributed Online Multi-Label Learning with Privacy Protection in Internet of Things” is of well interest. It proposed a distributed approach for online multi-label classification problems that allows interaction between different nodes and modifies the metric model to achieve the performance of centralized multi-label classification algorithms without any raw data transfer.

Below are comments are suggestions:

·         Do not use abbreviations in the abstract.

·         In line number 180 ….”problem of interest in this chapter…” what chapter ? correct or explain this

·         Explain the terminologies of equations 9-11

·         Line number 229 describes “All experiments are conducted on a desktop with Intel Core i5-7500 @ 3.40GHz and 16GB RAM, running on Python 3.8 with Windows 10 platform.” Is there any specific requirements for the hardware? Why have you given this? If yes, then clarify/define that.

·         What is the criteria of selecting the dataset for the study?

·         Authors have used the word ”we” many times, better rephrase these sentences.

·         The results shown in the Table 6 and 7, requires more explanations.

·         The results shown in the Figures 5 and 6, requires more explanations.

·         Rewrite the conclusion section, include some important results.

 

·         In conclusion section, add recommendations based on the findings and also add practical and theoretical implications.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 4 Report

The paper presents distributed online multi-label learning model with privacy protection in the internet of things. The authors developed a distributed learning model that eliminates the need for a centralized server. The experimental evaluation, with various accuracy and performance matrix evaluations, has been conducted with a convenience explanation. The paper is well written and includes comprehensive details about the developed model and its specifications. There are a few concerns, as outlined below:
1- The abstract should contain key outperform results.
2- It would be nice to add a workflow model in the methodology section, with some key parameters that are used in the developed model (section 5).
3- The authors should elaborate more on the experimental evaluation of the possible drawbacks in the developed model.
4- The conclusion section should include key possible future challenges.

5- The reference list should include more related work from the last 5 years.

 

Best regards,

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Reviewer # 1

The authors well tried to overcome the mentioned concerns though some of them remain challenges due to their efficiency in the quality of the manuscript, which is as follows:

Before attending the previous comments in the first round of reviewing process:

·         I faced a bold problem in the manuscript that consists of emerging “?” sign on the text instead of the number such as “Figure?”, “Table?”, etc.

·         The list of references was removed from the manuscript.

The remaining concerns of the previous review are included:

Point 13: The authors referred to old references of multi-label classification. The issue created an unfair analysis to prove the efficiency of the proposed idea. I recommend analyzing recent studies in the research area to demonstrate the superiority of the study in comparing them.

Response 13: Thank you for pointing it out. We have updated the reference list as mentioned in the previous response. Our research question was to solve a new problem, so we used the control variates method to constitute the experiment. The recent studies are not solving the same problem as ours and thus have limited value to be compared. In addition, our proposed method is under a harder scenario than other methods in the experiments. If the performance is the same, that would say we have overcome the problem caused by the difficult scenario.

Re-comment: Due to removing the references list of the manuscript: I had no access to the updated resources to investigate them.

Point 19: Please, describe the ranges of all employed variables such as n, i, j, etc (as an example i>0, etc).

Response 19: Thank you for your feedback. Subscripts ‘i’ or ‘j’ denote the name of the node, which is not a variable. Hence the ranges are not applied to them. According to your comments, we write some statements to describe the related variables, such as, ‘n denote the number of examples accumulated on the node.’. The number of examples is a positive integer and does not require a special value range definition.

Re-comment: I realized that i and j are the index numbers of the nodes (it was a sample). Nevertheless, the range of the indexes, variables, and all employed parameters have to specify. You have to clarify that the node number starts and ends from a few.

Point 22: The Equations seem to follow the base model of Projection (linear algebra) to perform map operations. Please, refer to the main reference in 1979 to present the Equations if you followed the Formula. Otherwise, clarify the difference between the proposed Equations and the base model of the projection

Response 22: Thank you for your research. It is correct that linear algebra is the foundation of the optimisation problem. In the case of our study, our proposed method was inspired by Wang, X's research in 2019. We applied it to a multi-label classification problem to solve our proposed scenario. This research work has been cited in the distributed OMLC Algorithm section.

Re-comment: Please, refer to the mentioned reference (Wang, X's research in 2019) for the Equations inspired by the resource.

Point 24: Please, define the m index and its range.

Response 24: Thank you for your comment. In the problem definition, we wrote some sentences such as, ‘So, we can suppose there is a network composed of m computing nodes in which two nodes that can communicate directly with each other are called neighbours.’. ‘m’ as the number of nodes is can only be a positive index. According to the specification for writing mathematical descriptions, such variables do not require a special additional declaration of the range of values.

Re-comment: Such as the re-comment on response 19.

Point 26: Please, utilize the integrated and standard structure to set the algorithm. I faced some inappropriate structures such as "for t=", "if instance", etc.

Response 26: Thanks for your suggestion. We have reviewed many pseudocode examples publish in the top conferences to ensure our usage is correct. As a result, we think that “for t = …” is standard usage. However, “if instance … < s” is an explanation of “if n < s”. In order to meet that standard, we have changed that back.

 

Re-comment: You can find any format of the presented algorithms in the various manuscripts; but it can not justify the issue if you do have not a strong reference to verify the pattern. A proper structure of an algorithm starts with the parameter’s declaration and follows the integrated structure to define "for", "parallel", and "if" instructions (such as for …do, if …do, etc). The issue helps to improve the quality of the manuscript and the legibility of the algorithm. You can continue with your format (it is only a recommendation).    

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

The authors have incorporated the changes.

 

Author Response

Thank you for your feedback.

Back to TopTop