Special Issue "Selected Papers from PROPOR 2020"

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Applications".

Deadline for manuscript submissions: closed (15 August 2020).

Special Issue Editors

Dr. Paulo Quaresma
E-Mail Website
Guest Editor
Informatics Department, University of Évora, 7000-671 Évora, Portugal
Interests: natural language processing; machine learning; AI
Special Issues and Collections in MDPI journals
Prof. Renata Vieira
E-Mail Website
Guest Editor
School of Technology, PUCRS, Porto Alegre, Brazil
Interests: Artificial Intelligence; Natural Language Processing; Agents; Computational Linguistics; Cognitive Science
Prof. Sandra Maria Aluísio
E-Mail Website
Guest Editor
Institute of Mathematical and Computer Sciences (ICMC) (São Carlos), São Paulo, Brazil
Interests: Natural Language Processing and Artificial Intelligence in Education; Corpus Linguistics; PoS tagging; Automatic Adaptation of Portuguese Texts; Semantic Role Labeling and Semantic Resources; Automatic Detection of Discourse Structure; Scientific Writing Tools; Automatic Term Extraction; Computer Adaptive Tests
Dr. Helena Gorete Silva Moniz
E-Mail Website
Guest Editor
Spoken Language Systems Lab, INESC-ID Lisboa, LISBOA, Portugal
Interests: Prosody; Educational Linguistics; Speech Processing; Machine Translation

Special Issue Information

Dear Colleagues,

The International Conference on Computational Processing of Portuguese (PROPOR) is the main event in the area of human language processing that is focused on theoretical and technological issues of written and spoken Portuguese. The meeting has been a very rich forum for the exchange of ideas and partnerships for the research communities dedicated to the automated processing of Portuguese, promoting the development of methodologies, resources, and projects that can be shared among researchers and practitioners in the field.

We call for papers describing work on any topic related to computational language and speech processing of Portuguese by researchers in the industry or academia. Topics of interest include but are not limited to:

  • Human speech production, perception, and communication;
  • Linguistic description and theories;
  • Natural language processing tasks (e.g., parsing, word sense disambiguation, coreference resolution);
  • Natural language processing applications (e.g., question answering, subtitling, summarization, sentiment analysis);
  • Speech technologies (e.g., spoken language generation, speech and speaker recognition, spoken language understanding);
  • Speech applications (e.g., spoken language interfaces, dialogue systems, speech-to-speech translation);
  • Resources, standardization and evaluation (e.g., corpora, ontologies, lexicons, grammars);
  • Language and speech processing in academic disciplines;
  • Portuguese language variants and dialect processing (including the language varieties of Portugal, Brazil, Cape Verde, Guinea-Bissau, Mozambique, Angola, São Tomé, Macau or Galiza);
  • Multilingual studies, methods, applications, and resources, including Portuguese.

PROPOR 2020 will be the 14th edition of the biennial PROPOR conference, hosted alternately in Brazil and in Portugal. Past meetings were held in Lisbon, PT (1993); Curitiba, BR (1996); Porto Alegre, BR (1998); Évora, PT (1999); Atibaia, BR (2000); Faro, PT (2003); Itatiaia, BR (2006); Aveiro, PT (2008); Porto Alegre, BR (2010); Coimbra, PT (2012); São Carlos, BR (2014), Tomar, PT (2016), and Canela, BR (2018). More detailed information: https://propor.di.uevora.pt/.

The authors of a number of selected full papers of high quality will be invited after the conference to submit revised and extended versions of their originally-accepted conference papers to this Special Issue of Information, published by MDPI, in open access. The selection of these best papers will be based on their ratings in the conference review process, quality of presentation during the conference, and expected impact on the research community. Each submission to this Special Issue should contain at least 50% of new material, e.g., in the form of technical extensions, more in-depth evaluations, or additional use cases and a change of title, abstract, and keywords. These extended submissions will undergo a peer-review process according to the journal’s rules of action. At least two technical committees will act as reviewers for each extended article submitted to this Special Issue; if needed, additional external reviewers will be invited to guarantee a high-quality reviewing process.

Dr. Paulo Quaresma
Prof. Renata Vieira
Prof. Sandra Maria Aluísio
Dr. Helena Gorete Silva Moniz
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
Data-Driven Critical Tract Variable Determination for European Portuguese
Information 2020, 11(10), 491; https://doi.org/10.3390/info11100491 - 21 Oct 2020
Cited by 1 | Viewed by 748
Abstract
Technologies, such as real-time magnetic resonance (RT-MRI), can provide valuable information to evolve our understanding of the static and dynamic aspects of speech by contributing to the determination of which articulators are essential (critical) in producing specific sounds and how (gestures). While a [...] Read more.
Technologies, such as real-time magnetic resonance (RT-MRI), can provide valuable information to evolve our understanding of the static and dynamic aspects of speech by contributing to the determination of which articulators are essential (critical) in producing specific sounds and how (gestures). While a visual analysis and comparison of imaging data or vocal tract profiles can already provide relevant findings, the sheer amount of available data demands and can strongly profit from unsupervised data-driven approaches. Recent work, in this regard, has asserted the possibility of determining critical articulators from RT-MRI data by considering a representation of vocal tract configurations based on landmarks placed on the tongue, lips, and velum, yielding meaningful results for European Portuguese (EP). Advancing this previous work to obtain a characterization of EP sounds grounded on Articulatory Phonology, important to explore critical gestures and advance, for example, articulatory speech synthesis, entails the consideration of a novel set of tract variables. To this end, this article explores critical variable determination considering a vocal tract representation aligned with Articulatory Phonology and the Task Dynamics framework. The overall results, obtained considering data for three EP speakers, show the applicability of this approach and are consistent with existing descriptions of EP sounds. Full article
(This article belongs to the Special Issue Selected Papers from PROPOR 2020)
Show Figures

Figure 1

Article
Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese
Information 2020, 11(10), 484; https://doi.org/10.3390/info11100484 - 15 Oct 2020
Cited by 1 | Viewed by 744
Abstract
Two sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage of [...] Read more.
Two sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage of pre-trained models (BERT); additionally, we studied the roles of lexical features. We tested our models in several datasets—ASSIN, SICK-BR and ASSIN2—and the best results were usually achieved with ptBERT-Large, trained in a Brazilian corpus and tuned in the latter datasets. Besides obtaining state-of-the-art results, this is, to the best of our knowledge, the most all-inclusive study about natural language inference and semantic textual similarity for the Portuguese language. Full article
(This article belongs to the Special Issue Selected Papers from PROPOR 2020)
Show Figures

Figure 1

Article
The BioVisualSpeech Corpus of Words with Sibilants for Speech Therapy Games Development
Information 2020, 11(10), 470; https://doi.org/10.3390/info11100470 - 02 Oct 2020
Viewed by 776
Abstract
In order to develop computer tools for speech therapy that reliably classify speech productions, there is a need for speech production corpora that characterize the target population in terms of age, gender, and native language. Apart from including correct speech productions, in order [...] Read more.
In order to develop computer tools for speech therapy that reliably classify speech productions, there is a need for speech production corpora that characterize the target population in terms of age, gender, and native language. Apart from including correct speech productions, in order to characterize the target population, the corpora should also include samples from people with speech sound disorders. In addition, the annotation of the data should include information on the correctness of the speech productions. Following these criteria, we collected a corpus that can be used to develop computer tools for speech and language therapy of Portuguese children with sigmatism. The proposed corpus contains European Portuguese children’s word productions in which the words have sibilant consonants. The corpus has productions from 356 children from 5 to 9 years of age. Some important characteristics of this corpus, that are relevant to speech and language therapy and computer science research, are that (1) the corpus includes data from children with speech sound disorders; and (2) the productions were annotated according to the criteria of speech and language pathologists, and have information about the speech production errors. These are relevant features for the development and assessment of speech processing tools for speech therapy of Portuguese children. In addition, as an illustration on how to use the corpus, we present three speech therapy games that use a convolutional neural network sibilants classifier trained with data from this corpus and a word recognition module trained on additional children data and calibrated and evaluated with the collected corpus. Full article
(This article belongs to the Special Issue Selected Papers from PROPOR 2020)
Show Figures

Figure 1

Article
Evaluating Richer Features and Varied Machine Learning Models for Subjectivity Classification of Book Review Sentences in Portuguese
Information 2020, 11(9), 437; https://doi.org/10.3390/info11090437 - 11 Sep 2020
Viewed by 934
Abstract
Texts published on social media have been a valuable source of information for companies and users, as the analysis of this data helps improving/selecting products and services of interest. Due to the huge amount of data, techniques for automatically analyzing user opinions are [...] Read more.
Texts published on social media have been a valuable source of information for companies and users, as the analysis of this data helps improving/selecting products and services of interest. Due to the huge amount of data, techniques for automatically analyzing user opinions are necessary. The research field that investigates these techniques is called sentiment analysis. This paper focuses specifically on the task of subjectivity classification, which aims to predict whether a text passage conveys an opinion. We report the study and comparison of machine learning methods of different paradigms to perform subjectivity classification of book review sentences in Portuguese, which have shown to be a challenging domain in the area. Specifically, we explore richer features for the task, using several lexical, centrality-based and discourse features. We show the contributions of the different feature sets and evidence that the combination of lexical, centrality-based and discourse features produce better results than any of the feature sets individually. Additionally, by analyzing the achieved results and the acquired knowledge by some symbolic machine learning methods, we show that some discourse relations may clearly signal subjectivity. Our corpus annotation also reveals some distinctive discourse structuring patterns for sentence subjectivity. Full article
(This article belongs to the Special Issue Selected Papers from PROPOR 2020)
Show Figures

Figure 1

Article
Developing Amaia: A Conversational Agent for Helping Portuguese Entrepreneurs—An Extensive Exploration of Question-Matching Approaches for Portuguese
Information 2020, 11(9), 428; https://doi.org/10.3390/info11090428 - 01 Sep 2020
Viewed by 945
Abstract
This paper describes how we tackled the development of Amaia, a conversational agent for Portuguese entrepreneurs. After introducing the domain corpus used as Amaia’s Knowledge Base (KB), we make an extensive comparison of approaches for automatically matching user requests with Frequently Asked Questions [...] Read more.
This paper describes how we tackled the development of Amaia, a conversational agent for Portuguese entrepreneurs. After introducing the domain corpus used as Amaia’s Knowledge Base (KB), we make an extensive comparison of approaches for automatically matching user requests with Frequently Asked Questions (FAQs) in the KB, covering Information Retrieval (IR), approaches based on static and contextual word embeddings, and a model of Semantic Textual Similarity (STS) trained for Portuguese, which achieved the best performance. We further describe how we decreased the model’s complexity and improved scalability, with minimal impact on performance. In the end, Amaia combines an IR library and an STS model with reduced features. Towards a more human-like behavior, Amaia can also answer out-of-domain questions, based on a second corpus integrated in the KB. Such interactions are identified with a text classifier, also described in the paper. Full article
(This article belongs to the Special Issue Selected Papers from PROPOR 2020)
Show Figures

Figure 1

Article
Modeling the Paraphrase Detection Task over a Heterogeneous Graph Network with Data Augmentation
Information 2020, 11(9), 422; https://doi.org/10.3390/info11090422 - 01 Sep 2020
Cited by 2 | Viewed by 887
Abstract
Paraphrase detection is a Natural-Language Processing (NLP) task that aims at automatically identifying whether two sentences convey the same meaning (even with different words). For the Portuguese language, most of the works model this task as a machine-learning solution, extracting features and training [...] Read more.
Paraphrase detection is a Natural-Language Processing (NLP) task that aims at automatically identifying whether two sentences convey the same meaning (even with different words). For the Portuguese language, most of the works model this task as a machine-learning solution, extracting features and training a classifier. In this paper, following a different line, we explore a graph structure representation and model the paraphrase identification task over a heterogeneous network. We also adopt a back-translation strategy for data augmentation to balance the dataset we use. Our approach, although simple, outperforms the best results reported for the paraphrase detection task in Portuguese, showing that graph structures may capture better the semantic relatedness among sentences. Full article
(This article belongs to the Special Issue Selected Papers from PROPOR 2020)
Show Figures

Figure 1

Back to TopTop