Next Article in Journal
Tag-Driven Online Novel Recommendation with Collaborative Item Modeling
Previous Article in Journal
Language-Agnostic Relation Extraction from Abstracts in Wikis
Previous Article in Special Issue
Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
Open AccessArticle

Recognizing Textual Entailment: Challenges in the Portuguese Language

LIACC/DEI, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal
Author to whom correspondence should be addressed.
This manuscript is an extended version of “Recognizing Textual Entailment and Paraphrases in Portuguese”, presented at the Text Mining and Applications (TeMA) track of the 18th EPIA Conference on Artificial Intelligence (EPIA 2017) and published in Progress in Artificial Intelligence, Springer LNAI 10423, pp. 868–879
Information 2018, 9(4), 76;
Received: 28 January 2018 / Revised: 20 March 2018 / Accepted: 26 March 2018 / Published: 29 March 2018
Recognizing textual entailment comprises the task of determining semantic entailment relations between text fragments. A text fragment entails another text fragment if, from the meaning of the former, one can infer the meaning of the latter. If such relation is bidirectional, then we are in the presence of a paraphrase. Automatically recognizing textual entailment relations captures major semantic inference needs in several natural language processing (NLP) applications. As in many NLP tasks, textual entailment corpora for English abound, while the same is not true for more resource-scarce languages such as Portuguese. Exploiting what seems to be the only Portuguese corpus for textual entailment and paraphrases (the ASSIN corpus), in this paper, we address the task of automatically recognizing textual entailment (RTE) and paraphrases from text written in the Portuguese language, by employing supervised machine learning techniques. We employ lexical, syntactic and semantic features, and analyze the impact of using semantic-based approaches in the performance of the system. We then try to take advantage of the bi-dialect nature of ASSIN to compensate its limited size. With the same aim, we explore modeling the task of recognizing textual entailment and paraphrases as a binary classification problem by considering the bidirectional nature of paraphrases as entailment relationships. Addressing the task as a multi-class classification problem, we achieve results in line with the winner of the ASSIN Challenge. In addition, we conclude that semantic-based approaches are promising in this task, and that combining data from European and Brazilian Portuguese is less straightforward than it may initially seem. The binary classification modeling of the problem does not seem to bring advantages to the original multi-class model, despite the outstanding results obtained by the binary classifier for recognizing textual entailments. View Full-Text
Keywords: artificial intelligence; machine learning; natural language processing; recognizing textual entailment; paraphrase detection artificial intelligence; machine learning; natural language processing; recognizing textual entailment; paraphrase detection
Show Figures

Figure 1

MDPI and ACS Style

Rocha, G.; Lopes Cardoso, H. Recognizing Textual Entailment: Challenges in the Portuguese Language. Information 2018, 9, 76.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop