Next Article in Journal
Social Media Marketing in Creative Industries: How to Use Social Media Marketing to Promote Computer Games?
Previous Article in Journal
Usability Evaluation—Advances in Experimental Design in the Context of Automated Driving Human–Machine Interfaces
Review

Corpus-Based Paraphrase Detection Experiments and Review

by 1,* and 2,3,*
1
Faculty of Teacher Education, University of Zagreb, Savska Cesta 77, 10000 Zagreb, Croatia
2
Department of Informatics, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, Croatia
3
Center for Artificial Intelligence and Cybersecurity, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, Croatia
*
Authors to whom correspondence should be addressed.
Information 2020, 11(5), 241; https://doi.org/10.3390/info11050241
Received: 6 February 2020 / Revised: 14 April 2020 / Accepted: 26 April 2020 / Published: 29 April 2020
Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection—where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed. View Full-Text
Keywords: semantic similarity; deep learning; paraphrasing corpora; experiments; natural language processing semantic similarity; deep learning; paraphrasing corpora; experiments; natural language processing
Show Figures

Figure 1

MDPI and ACS Style

Vrbanec, T.; Meštrović, A. Corpus-Based Paraphrase Detection Experiments and Review. Information 2020, 11, 241. https://doi.org/10.3390/info11050241

AMA Style

Vrbanec T, Meštrović A. Corpus-Based Paraphrase Detection Experiments and Review. Information. 2020; 11(5):241. https://doi.org/10.3390/info11050241

Chicago/Turabian Style

Vrbanec, Tedo, and Ana Meštrović. 2020. "Corpus-Based Paraphrase Detection Experiments and Review" Information 11, no. 5: 241. https://doi.org/10.3390/info11050241

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop