Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Research on Intelligent Perception Algorithm for Sensitive Information

Appl. Sci. 2023, 13(6), 3383; https://doi.org/10.3390/app13063383

by Lin Huo^1,*,† and Juncong Jiang^2,†

Reviewer 1:

Zuchao Li

Reviewer 2: Anonymous

Appl. Sci. 2023, 13(6), 3383; https://doi.org/10.3390/app13063383

Submission received: 21 January 2023 / Revised: 3 March 2023 / Accepted: 5 March 2023 / Published: 7 March 2023

(This article belongs to the Special Issue New Techniques of Machine Learning and Deep Learning in Text Classification)

Round 1

Reviewer 1 Report

The essay attempts to detect the existence of private information in documents with models of natural language processing. There has been relevant research in this field. However, the research differs from its predecessors in its adoption of ERNIE model propounded by Baidu along with RNN for text analysis. Raising its own text sensitive information intelligent algorithm, the study carried out relatively straightforward binary classification of the texts, which demonstrates a fair level of novelty. In the Experiment section, it can be observed that the proposed model outperforms current models over a majority of indices, which testifies to the superior capabilities and prospects of the model.

Strengths:

1. This paper introduces the strengths and weakness of related work in detail.

2. TSIIP are trained and tested using datasets such as THUNews, ChineseWikipedia and a Chinese sensitive information dataset JWBD.

3. The KG-ERNIE model uses a knowledge graph-based entity embedding technique and an ERNIE-based pre-training model to encode input text and extract semantic information and features.

4. TSIIP can detect sensitive words at the word level and sensitive statements at the statement level.

Weakness:

1. The evaluation strategy and standard of this model are relatively simple, and the evaluation score only considers single-level semantics.

2. There are many confusing points, see Comments

Comments:

1. On the source of data

In Experiment, the essay made a reference to the datasets used for the experiment, stating that it adopted data from THUNews as white sample and that it also used the JWBD dataset, yet it fails to elaborate on how the “sensitive word category library” (which would be used in the classifier) was obtained or constructed. In the meantime, the intended usage of JWBD was unclarified as well.

2. On result analysis

The paper lacks of explanation about why ENIRE is inferior to BERT in precision and ENIRE is better than BERT in recall, F1 score and F2 score in Figure 5.

Near the end of the essay, the outcomes of the experiments are presented, and a conclusion is drawn after a comparison between the proposed model and other existing ones over four indices: precision, recall, F1-score, and F2-score. However, the paper does not attempt to offer any explanation or causes of the achieved outcomes, especially considering that the model is outshone by BERT on precision. It is somewhat disappointing that the study fails to make possible hypotheses about this issue, and never even mentions this issue in the main part of the essay.

3. Minor comments: On the meaning of the study and future goal

1) In the conclusion part, it would be better if the essay could reinforce the real-life benefits that may be potentially brought, or relate to the current progress of this study and give estimations of its applicational impact. Further emphasizing the point near the beginning of the essay would be desired.

2) At the end of the essay, the further research goal is revealed as to upgrade the current simple dichotomy classification to a quantitative assessment of the text sensitivity. However, is it possible to expand on that and relate to greater goals in the long run? Some possible aspects that might be improved might concern the model architecture or potential extended application realm of the research.

4.Expressions:

1) On Line 217 exists a layout error, which should be w_1^{i−1}

2) On Line 249, in “The final score Score”, the word “score” is redundant.

3) When the paper proposes its formulation on further study at the end, the phrasing is rather perplexing. It might be better to phrase it this way: “In our future work, we intend to further develop our current binary classification system into a quantitative assessment of the text sensitivity level."

4) I think there is a grammar error on line 22. “Perception” is a noun in “to automatically perception”, I think it should be replaced by “perceive”. For example, author uses “to automatically perceive sensitive data” on line 323.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper presents the Text Sensitive Information Intelligent Perception algorithm (TSIIP), which detects sensitive words.

The paper is well written however related work, there is no comparison with another intelligent perception algorithm. Add the comparison

Also, add the structure of the paper at the end of the introduction

In the method section, kindly explain the ERINE AND KG MODEL with the help of an example

Algorithm 1 needs to be explained with the help of an example

What were the criteria for data set selection?

Experimental results can be presented in tabular form

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Summary:

This paper proposed an intelligent perception algorithm (TSIIP) for sensitive information classification at word level and statement level and proposed a recognition model to extract the text features using knowledge graph and pretrained ERNIE model. The experimental results on the proposed dataset show that this method performs better than other SOTA methods.

Strength:

1. This paper proposed a Chinese sensitive information library base on previous Chinese datasets

2. The proposed TSIIP is a novel method to evaluate sensitive scores of texts from multi-level perspective

3. The experimental results on the proposed dataset show that this method performs better than other SOTA methods

Weakness:

1. There is a great deal of useless narrative in this paper, for example, a great deal of space is devoted to ERNIE and BERT. Most of these texts have no relevance to the content of the method proposed in this paper or could be summarized in simple language

2. It is not clear from the text what the initial input to the KG model is. Where does {w_1^0, w_2^0, ..., w_n^0} and {e_1^0, e_2^0, ..., e_n^0} come from?

Assuming that {w_1^0, w_2^0, ..., w_n^0} and {w_10, w_2, ..., w_n} are equivalent, {e_1^0, e_2^0, ..., e_n^0} and {e_1, e_2, ..., e_n} are equivalent, where does the structured semantic information "knowledge graphs" described in the text come from?

Furthermore, how does the direct input of text vectors {e_1, e_2, ..., e_n} into the KG model reflect the "entity information" mentioned in the text? Is there any processing of {e_1, e_2, ..., e_n} in the text to label the "entities" in the vector?

1. The “Text Sensitive Information Intelligent Perception (TSIIP)” appears in the content for so many times, it is enough to use abbreviations from the second time

2. Limited by the gap between training data and inaccessible real private data and training data in limited fields, this method may have weak generalization ability in practical application。

3. In line 220 ,there is “lose the information about the features between words and words and between words”. This kind of statement is incomprehensible and it is suggested that the wording be reconsidered.

Typos:

1. Line 255 : imputs->inputs

2. Equations 4 and 5: the term to the right of “=” should not need to be superscripted with “~”

Author Response

Thank you very much for your approval on our revised manuscript.

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors have addressed most comments except for a clear comparison in related work.

Kindly add a clear comparison paragraph at the end of the related work. Rest is good to go

Author Response

Thank you very much for your approval on our revised manuscript.

Please see the attachment.

Author Response File: Author Response.pdf

Article Menu

Research on Intelligent Perception Algorithm for Sensitive Information

Further Information

Guidelines

MDPI Initiatives

Follow MDPI