Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

RSVN: A RoBERTa Sentence Vector Normalization Scheme for Short Texts to Extract Semantic Information

Appl. Sci. 2022, 12(21), 11278; https://doi.org/10.3390/app122111278

by Lei Gao¹, Lijuan Zhang^1,*, Lei Zhang^1,2 and Jie Huang^1,*

Reviewer 1:

Peter Organisciak

Reviewer 2:

Peter Kokol

Appl. Sci. 2022, 12(21), 11278; https://doi.org/10.3390/app122111278

Submission received: 15 September 2022 / Revised: 22 October 2022 / Accepted: 5 November 2022 / Published: 7 November 2022

(This article belongs to the Special Issue Intelligent Control Using Machine Learning)

Round 1

Reviewer 1 Report

This paper addresses Chinese language entity linking, between short texts and knowledge base articles, using a version of RoBERTa modified for vector use. The research is strongest in presenting its results: you present the data and evaluate clearly, and probe the performance well with an ablation study and other small comparisons. However, the paper presentation can be improved: you can present, justify, and outline your study much better, to help ensure that people actually read through to the results.

Below are some suggestions and requests for improving the communication of the paper.

You need to define the problem better at the start. The introductory line is weak - consider something more direct than the very general 'Web has become one of the largest databases in the world. Define anistropy early - it's not a term that readers may be familiar with, (though I was glad to see the Ehayarajh 2020 paper cited). Similarly, you refer to entity linking multiple times in the introduction, but you don't actually describe what it is. Make sure to explain it for an unfamiliar reader.

I think the correction for anistropy is sensible to pursue. It has been shown repeatedly that codes from transformer-based models cannot be used (well) as vector embeddings, e.g. by taking the [CLS] token representation, so I'm glad to see that you're applying that flow correction. However, a few points. First, I don't think that justification is clear enough for a reader of your paper! Secondly, you don't explain why the entity linking approach has to work with vector embeddings, rather than as a system that adds a classification layer straight onto RoBERTa. Is it for computational performance? Whatever the reason, just preemptively justifying your choice for the reader.

The primary paper that I think is missing for this work is the SBERT paper by Riemers and Gurevych. That was the first big paper to show the limits of sentence embeddings, so it supports your argument. At the same time, you should also address why you chose not to use a pre-trained customized-for-embeddings transformers system like SBERT (though SBERT itself has been surpassed; for example, there's the sentence-transformers system, and sentenceT5 from Ni et al, and GPT-3 Embeddings from Neelakantan et al). I don't know if any of those work well with Chinese, so that may be a strong reason, but you should at least address it.

>Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. https://arxiv.org/abs/1908.10084v1
> Ni, J., Ábrego, G. H., Constant, N., Ma, J., Hall, K. B., Cer, D., & Yang, Y. (2021). Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models (arXiv:2108.08877). arXiv. https://doi.org/10.48550/arXiv.2108.08877
> Neelakantan, A., Xu, T., Puri, R., Radford, A., Han, J. M., Tworek, J., Yuan, Q., Tezak, N., Kim, J. W., & Hallacy, C. (2022). Text and code embeddings by contrastive pre-training. ArXiv Preprint ArXiv:2201.10005.

The related work section is fairly good. It is odd to suggest that sparse matrix approaches were used prior to word2vec - shouldn't you make note of, at least, Latent Semantic Analysis from Deerweester and others?

"To sum up, the Word2Vec model considers contextual information"
- This is incorrect. Word embedding models do not factor in context - each word in a vocabulary only exists in one place within the vector space. It wasn't until CoVe that these models started to include context.

The first line of section 2.2. needs editing - it is overlong and hard to follow. YOu should also consider citing ELMo for multi-layer contextual models.

I like that you introduce BERT and RoBERTa for their strengths, though I wouldn't position BERT's primary contribution as being a 'pre-trained model'. Word2Vec had a pretrained model. The fact that pre-trained models were included with BERT and others is important, but not the primary difference about them. What's notable is that it's a transformer architecture, which can learn complex, contextual representations of lots of text.

- What you mean by this line is unclear: "With the development of wireless sensor network, it provides the basis for many intelligent applications"

- please define what is meant by 'short' and 'long', so that the reader understands how you think of them. For example, I would consider most applications to be based on short texts, in that large language models tend to support no more than 512 or 1024 tokens, whereas 'long documents' are things like books or essays. This is clearly not how you think of them.

- You did a good job demonstrating the problem with a practical example in Fig 1. I like the examples that you also showed later for the evaluation dataset - these do a good job of helping a reader understand the details.

- Section 3.1 needs editing. I understood that there needs to be a candidate search process, to lower the pairwise comparisons between knowledge base articles and entities, but it was difficult to follow.

- The approach of prepending category labels to the sentence prior to extracting the embeddings was interesting.

Typos:
This is not a complete list - this paper needs a close English language review.
- 'prominent part in the area of' -> 'prominent part of'
- - 'the researches on' -> 'research on'
- Roberta is cased as RoBERTa
- "afterwards, entity linking model' -> need 'the'
- 'that are associated to' -> 'that are associated with'

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

The paper present an interesting and timely topic. The paper is well written and structured. The xperiment is performed competentky and results are clearly presented. My only complain is that the study limitations should be added and the resukts compared to other studies.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

This revision has satisfied my earlier concerns. The work is sound, and this new revision has improved the presentation and communication. Thank you for the care taken in the edit.

Article Menu

RSVN: A RoBERTa Sentence Vector Normalization Scheme for Short Texts to Extract Semantic Information

Further Information

Guidelines

MDPI Initiatives

Follow MDPI