Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Graph Embedding-Based Domain-Specific Knowledge Graph Expansion Using Research Literature Summary

Sustainability 2022, 14(19), 12299; https://doi.org/10.3390/su141912299

by Junho Choi

Reviewer 1:

Salabat Khan

Reviewer 2:

Michal Krátký

Sustainability 2022, 14(19), 12299; https://doi.org/10.3390/su141912299

Submission received: 30 July 2022 / Revised: 22 September 2022 / Accepted: 23 September 2022 / Published: 27 September 2022

(This article belongs to the Special Issue Challenges for Future Applications of Smart Industries)

Round 1

Reviewer 1 Report

The manuscript, “Graph embedding-based domain-specific knowledge graph expansion using research literature summary,” by Junho Choi presents domain-specific knowledge graph expansion mechanism based on graph-embedding. First, the author conduct preprocessing and summarization of literature data using BERTSUM model. Then the author proposes a method of graph expansion using Google news after extracting related information of entities through web for the entities in the generated knowledge graph. The author addresses and investigates a real-world problem of generating and expanding knowledge bases and the graph-embedding problem using a research literature summary. Moreover, the author measured the performance of proposed model using mean reciprocal rank, mean ran, and HITN rank-based evaluation metric. The manuscript is well written, well presented and the topic of the paper is appropriate for the prestigious Sustainability journal.

. Still, the reviewer found some minor issues and mistakes in this manuscript that need to be addressed.

The detail comments are as follows:

1. The reviewer found that some abbreviation were used without their proper definitions. For example, in the abstract BERTSUM is used without defining it. Similarly, [CLS] is used without proper definition. It is recommended to define abbreviations on their first occurrence.

2. The reviewer also found double definition of abbreviation used in the article. For Mean Rank (MR) abbreviation is defined in the Abstract section and in the Experimental Section is used. The author again defined it as “MR is the mean rank value of 344 all triples.” It is, therefore, recommended to add a table for defining and including all the abbreviation for reader’ comfort.

The references section contains 25 references, which are recommended to extend by citing some related work and survey articles from famous journals. For example, the following can be cited to improve your reference section.

1. https://doi.org/10.1016/j.neucom.2021.02.098

2. https://doi.org/10.3390/su14148877

3. 10.1109/COMST.2022.3178081

4. https://doi.org/10.1145/3409481.3409485

Overall, the manuscript is suitable to be published in prestigious journals like Sustainability. I would recommend being accepted.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Although this article can include some novel ideas, there are the following issues:

- A novelty is not well described:

- A part of the description is a part of Section 2.2 starting with the following sentence: In this paper, the BERTSUM language model is used for the research literature text summarization.

- Although it seems that it is only the usage of an existing method in your approach, move this part into the next section. Or the novel method BERTSUM is another method than the BERT model?

- A novelty is not clear in the approach described in Section 3.

- There is another term RE-BERT and it is not clear whether it is a novel model and how it is related to BERT and BERTSUM.

- It is similar in Section 4:

- There are missing references or description of methods Transformer and RNN in 4.1

- The results in Section 4.2 are compared in few sentences.

- There are missing references, for example:

- p1: Recently, research has been conducted on how to build a knowledge base ..

- p2: Furthermore, knowledge graphs provide a method for reconstructing ...

-p2: They require a huge amount of time and manpower, and the pre-processing process ...

- p4: ... the TransE learning method, which is a basic knowledge graph embedding method.

- p4: Typical techniques required for knowledge graph generation are NER and relation extraction between entities.

Other notices:

- pre-processing process - pre-processing method?

- p4: use: 'and relation vectors~[12]' instead of 'and relation vectors [12]'

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

My notices are similar as notices to the previous version. In my opinion, especially first 10 pages of the article are not well written. It is rather hard to distinguish the original and novel ideas. As a result, it is necessary to completely rewrite these pages. There are mainly the following two issues:

1. What are novel and original ideas?

- In abstract we can read: we summarize research literature using the Bidirectional Encoder Representations from Transformers for Summarization (BERTSUM) model based ...

- What does it mean?

- later: ... and design a Research-BERT (RE-BERT)

- Ok, BERTSUM is an existing method, RE-BERT is the novel method, however later: the result shows that the BERTSUM Classifier model’s ...

- Why RE-BERT is to not reported?

- Why RE-BERT is not depicted in the last paragraph of the first section, where the structure of the paper is described?

- Similarly, page 4: The pre-training process of the BERT model refers to the method of learning similar data in a large volume for solving problems that need to be solved in the future [27]. In this paper, the BERTSUM language model is used for the research literature text summarization. BERTSUM is a structure proposed for summarizing documents using the BERT model and is a pre trained model.

- I can read that BERT has been introduced in [27] but BERTSUM is not referenced, therefore it looks like a novel method. But you proposed that RE-BERT is the novel method.

2. Issues in Section 3:

- In Section 3.1 you introduce 4 layers of the framework. However, the context of the following sections 3.2 a 3.3 to these layers is not known. It is therefore hard to follow these sections.

- Figure 6 shows the architecture of Named Entity Recognition module.

- What is a link between this modul and above depicted layers?

Other notices:

- abstract: processing process

- missing articles, e.g. in natural language (p1)

- missing references, e.g.:

- Knowledge graphs can be used to combine and manage knowledge data from various sources. (p2)

- RNN, Transformer (p11)

- p2: Figure 1. An example of how to represent an entity-relationship of a knowledge graph.

- use: Figure 1. An example of an entity-relationship of a knowledge graph.

- p2: or extract the relationship by

- What relationship?

- into a k-dimensional vector

- use $k$

- p3: Figure 2: KG is not introduced.

- p3: Is Figure 2 meaningful for understanding of the TransE learning method? It seems that it is not.

- p9: Figure 7 shows the architecture of Named Entity Recognition module.

- However the caption of Figure 7 is: Relationship Extraction Module.

- p13: ... a better performance in terms of speed and knowledge graph quality.

- Where can I read results of the speed comparison?

Author Response

Dear Reviewer,

Thank you so much for the valuable suggestion and precious comments. We responded below to all comments and made corresponding adjustments in the paper. Thank you for the review.

Reviewer#2, Concern # 1:

What are novel and original ideas?

- In abstract we can read: we summarize research literature using the Bidirectional Encoder Representations from Transformers for Summarization (BERTSUM) model based ...

- What does it mean?

- later: ... and design a Research-BERT (RE-BERT)

- Ok, BERTSUM is an existing method, RE-BERT is the novel method, however later: the result shows that the BERTSUM Classifier model’s ...

- Why RE-BERT is to not reported?

- Why RE-BERT is not depicted in the last paragraph of the first section, where the structure of the paper is described?

- I can read that BERT has been introduced in [27] but BERTSUM is not referenced, therefore it looks like a novel method. But you proposed that RE-BERT is the novel method.

Author response: Thanks for raising the question.

This paper is a study to automatically generate a knowledge graph using research literature. Generating a knowledge graph is the extraction of objects and relationships. In previous research, entities and relationships were extracted using original literature data as it is to generate a Knowledge Graph.

However, in this paper, it is proved that better results can be obtained by extracting entities and relationships through a summary process for research literature data. As described earlier, the document summary in this paper utilized the previously designed BERTSUM model. BERTSUM is a proposed structure for performing document summarization based on the pre-trained language model BERT.

A document summary selects sentences with key content based on the semantic relationships between sentences. BERTSUM adds a [CLS] token to each input sentence to represent each sentence. Additionally, Segment Embedding with values of 0 and 1 was alternately entered to distinguish each sentence.

In this paper, we proposed a RE-BERT model modified from the BERT model to extract entities and relationships from the summarized research literature using the BERTSUM model.

After adjusting the average pooling using the average value and the max pooling applying the largest value, respectively, the entities were extracted through the fully connected layer and the softmax layer, and the process was described.

The relationship extraction model uses the results from the previous entity extraction model as input to the BERT model. It embeds the type information of an entity, combines it with the result of the language model, and finally extracts the relationship.

The RE-BERT model presented in this paper is designed so that the extracted entities HEAD_Type and TAIL_Type are additionally input to the BERT model. Connect the output value of the RE-BERT model to both the vector value and the type embedding value to which average pooling and max pooling are applied, respectively. After the connection result goes through a fully-connected layer, the relationship is finally classified through softmax operation.

Author action: We updated the manuscript by citing the article. The related references are listed below.

Liu, Y.; Lapata, M. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345 2019.

Reviewer#2, Concern # 2:

- In Section 3.1 you introduce 4 layers of the framework. However, the context of the following sections 3.2 a 3.3 to these layers is not known. It is therefore hard to follow these sections.

- Figure 6 shows the architecture of Named Entity Recognition module.

- What is a link between this module and above depicted layers?

Author response: We have corrected the title of Figure 6.

Figure 6. Entity Extraction Module.

Reviewer#2, Concern # 3:

- abstract: processing process

- missing references, e.g.:

- Knowledge graphs can be used to combine and manage knowledge data from various sources. (p2)

- RNN, Transformer (p11)

Author response: We have modified it to the following sentence:

we perform the preprocessing and text summarization with the collected research literature data.

We modified the reference and added the RNN related reference.

Guo, L.; Zhang, Q.; Ge, W.; Hu, W.; Qu, Y. DSKG: A deep sequential model for knowledge graph completion. In Proceedings of China Conference on Knowledge Graph and Semantic Computing 2018; pp. 65-77.

Author action:

Reviewer#2, Concern # 4

- p2: Figure 1. An example of how to represent an entity-relationship of a knowledge graph.

- use: Figure 1. An example of an entity-relationship of a knowledge graph.

- p2: or extract the relationship by

- What relationship?

Author response: We have modified the title of the picture as follows.

Figure 1. An example of an entity-relationship of a knowledge graph.
Figure 2. An example of a knowledge graph embedding for relationship inference.

Reviewer#2, Concern # 5

- into a k-dimensional vector

- use $k$

- p3: Figure 2: KG is not introduced.

- p3: Is Figure 2 meaningful for understanding of the TransE learning method? It seems that it is not.

Author response: Thanks for the suggestion. We have revised this on page 2 of the paper.

Reviewer#2, Concern # 6

- p9: Figure 7 shows the architecture of Named Entity Recognition module.

- However the caption of Figure 7 is: Relationship Extraction Module.

Author response: We updated the manuscript with carefully check. We have revised this on page 9 of the paper.

Figure 7. Relationship Extraction Module.

- p13: ... a better performance in terms of speed and knowledge graph quality.

- Where can I read results of the speed comparison?

Author response: Thanks for the suggestion. We have revised this on page 13 of the paper.

Thank you very much for your conscientiousness, and broad knowledge on the relevant research area.We have revised the manuscript carefully uppon all your comments above.

Article Menu

Graph Embedding-Based Domain-Specific Knowledge Graph Expansion Using Research Literature Summary

Further Information

Guidelines

MDPI Initiatives

Follow MDPI