Next Article in Journal
Social Media Marketing in Creative Industries: How to Use Social Media Marketing to Promote Computer Games?
Previous Article in Journal
Usability Evaluation—Advances in Experimental Design in the Context of Automated Driving Human–Machine Interfaces
Open AccessReview

Corpus-Based Paraphrase Detection Experiments and Review

by Tedo Vrbanec 1,* and Ana Meštrović 2,3,*
1
Faculty of Teacher Education, University of Zagreb, Savska Cesta 77, 10000 Zagreb, Croatia
2
Department of Informatics, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, Croatia
3
Center for Artificial Intelligence and Cybersecurity, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, Croatia
*
Authors to whom correspondence should be addressed.
Information 2020, 11(5), 241; https://doi.org/10.3390/info11050241
Received: 6 February 2020 / Revised: 14 April 2020 / Accepted: 26 April 2020 / Published: 29 April 2020
Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection—where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed. View Full-Text
Keywords: semantic similarity; deep learning; paraphrasing corpora; experiments; natural language processing semantic similarity; deep learning; paraphrasing corpora; experiments; natural language processing
Show Figures

Figure 1

MDPI and ACS Style

Vrbanec, T.; Meštrović, A. Corpus-Based Paraphrase Detection Experiments and Review. Information 2020, 11, 241.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop