Special Issue "Advances in Machine Learning Methods for Natural Language Processing and Computational Linguistics"

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: 27 June 2022 | Viewed by 2748

Special Issue Editors

Prof. Dr. Manuel Vilares-Ferro
E-Mail Website
Guest Editor
Department of Computer Science, University of Vigo, 32004 Ourense, Spain
Interests: theory of formal languages; machine translation; artificial intelligence; information extraction
Prof. Dr. Pavel Brazdil
E-Mail Website
Guest Editor
Laboratory of Artificial Intelligence & Decision Support,INESC TEC, 4200 465 Porto, Portugal
Interests: machine learning; data mining; metalearning; knowledge discovery in databases; text mining; automatic summarization
Prof. Dr. Gaël Dias
E-Mail Website
Guest Editor
University of Caen Normandy, CNRS, GREYC UMR 6072, F-14032 Caen Cedex, France
Interests: natural language processing; information retrieval; affective computing

Special Issue Information

Dear Colleagues,

Machine learning (ML) algorithms can be used to analyze vast volumes of information, identify patterns and generate models capable of recognizing them in new data instances. This allows us to address complex tasks with the only constraint being the necessity of a suitable training database.

Furthermore, today's digital society provides access to a vast range of raw data, but also generates the need for managing them effectively. This makes up natural language processing (NLP), a collective term referring to the automatic computational treatment of human languages for which purely symbolic techniques show clear limitations, a popular field for exploiting ML capacities. The same is true for computational linguistics (CL), which is more concerned with the study of linguistics.

However, this collaborative framework must be based on a formally well-informed strategy to ensure its reliability. In this context, this Special Issue focuses on both the application of ML techniques to solve NLP and CL tasks and on the generation of linguistic resources to enable this, for example, the construction of syntactic structures without recurse to tree banks for training, which would greatly simplify the implementation of statistical-based parsers, especially when dealing with out-of-domain scenarios or low-resource languages. By way of a more applicative issue, we could address the generation of models allowing efficient contextual representations, a nontrivial task when dealing with large-scale or multiple documents, but essential for language understanding.

Prof. Dr. Manuel Vilares-Ferro
Prof. Dr. Pavel Brazdil
Prof. Dr. Gaël Dias
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • ML-based tools for CL and NLP
  • Domain-specific and low-resource languages
  • Generation of training resources from raw data
  • Halting conditions and over–under-fitting detection
  • Integration of symbolic and model-based processing
  • Reasoning about large and multiple documents
  • Sampling strategies

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
Intent-Controllable Citation Text Generation
Mathematics 2022, 10(10), 1763; https://doi.org/10.3390/math10101763 - 21 May 2022
Abstract
We study the problem of controllable citation text generation by introducing a new concept to generate citation texts. Citation text generation, as an assistive writing approach, has drawn a number of researchers’ attention. However, current research related to citation text generation rarely addresses [...] Read more.
We study the problem of controllable citation text generation by introducing a new concept to generate citation texts. Citation text generation, as an assistive writing approach, has drawn a number of researchers’ attention. However, current research related to citation text generation rarely addresses how to generate the citation texts that satisfy the specified citation intents by the paper’s authors, especially at the beginning of paper writing. We propose a controllable citation text generation model that extends a pre-trained sequence to sequence models, namely, BART and T5, by using the citation intent as the control code to generate the citation text, meeting the paper authors’ citation intent. Experimental results demonstrate that our model can generate citation texts semantically similar to the reference citation texts and satisfy the given citation intent. Additionally, the results from human evaluation also indicate that incorporating the citation intent may enable the models to generate relevant citation texts almost as scientific paper authors do, even when only a little information from the citing paper is available. Full article
Article
Incorporating Phrases in Latent Query Reformulation for Multi-Hop Question Answering
Mathematics 2022, 10(4), 646; https://doi.org/10.3390/math10040646 - 19 Feb 2022
Viewed by 286
Abstract
In multi-hop question answering (MH-QA), the machine needs to infer the answer to a given question from multiple documents. Existing models usually apply entities as basic units in the reasoning path. Then they use relevant entities (in the same sentence or document) to [...] Read more.
In multi-hop question answering (MH-QA), the machine needs to infer the answer to a given question from multiple documents. Existing models usually apply entities as basic units in the reasoning path. Then they use relevant entities (in the same sentence or document) to expand the path and update the information of these entities to finish the QA. The process might add an entity irrelevant to the answer to the graph and then lead to incorrect predictions. It is further observed that state-of-the-art methods are susceptible to reasoning chains that pivot on compound entities. To make up the deficiency, we present a viable solution, i.e., incorporate phrases in the latent query reformulation method (IP-LQR), which incorporates phrases in the latent query reformulation to improve the cognitive ability of the proposed method for multi-hop question answering. Specifically, IP-LQR utilizes information from relevant contexts to reformulate the question in the semantic space. Then the updated query representations interact with contexts within which the answer is hidden. We also design a semantic-augmented fusion method based on the phrase graph, which is then used to propagate the information. IP-LQR is empirically evaluated on a popular MH-QA benchmark, HotpotQA, and the results of IP-LQR consistently outperform those of the state of the art, verifying its superiority. In summary, by incorporating phrases in the latent query reformulation and employing semantic-augmented embedding fusion, our proposed model can lead to better performance on MH-QA. Full article
Show Figures

Figure 1

Article
MisRoBÆRTa: Transformers versus Misinformation
Mathematics 2022, 10(4), 569; https://doi.org/10.3390/math10040569 - 12 Feb 2022
Viewed by 544
Abstract
Misinformation is considered a threat to our democratic values and principles. The spread of such content on social media polarizes society and undermines public discourse by distorting public perceptions and generating social unrest while lacking the rigor of traditional journalism. Transformers and transfer [...] Read more.
Misinformation is considered a threat to our democratic values and principles. The spread of such content on social media polarizes society and undermines public discourse by distorting public perceptions and generating social unrest while lacking the rigor of traditional journalism. Transformers and transfer learning proved to be state-of-the-art methods for multiple well-known natural language processing tasks. In this paper, we propose MisRoBÆRTa, a novel transformer-based deep neural ensemble architecture for misinformation detection. MisRoBÆRTa takes advantage of two state-of-the art transformers, i.e., BART and RoBERTa, to improve the performance of discriminating between real news and different types of fake news. We also benchmarked and evaluated the performances of multiple transformers on the task of misinformation detection. For training and testing, we used a large real-world news articles dataset (i.e., 100,000 records) labeled with 10 classes, thus addressing two shortcomings in the current research: (1) increasing the size of the dataset from small to large, and (2) moving the focus of fake news detection from binary classification to multi-class classification. For this dataset, we manually verified the content of the news articles to ensure that they were correctly labeled. The experimental results show that the accuracy of transformers on the misinformation detection problem was significantly influenced by the method employed to learn the context, dataset size, and vocabulary dimension. We observe empirically that the best accuracy performance among the classification models that use only one transformer is obtained by BART, while DistilRoBERTa obtains the best accuracy in the least amount of time required for fine-tuning and training. However, the proposed MisRoBÆRTa outperforms the other transformer models in the task of misinformation detection. To arrive at this conclusion, we performed ample ablation and sensitivity testing with MisRoBÆRTa on two datasets. Full article
Show Figures

Figure 1

Article
Gulf Countries’ Citizens’ Acceptance of COVID-19 Vaccines—A Machine Learning Approach
Mathematics 2022, 10(3), 467; https://doi.org/10.3390/math10030467 - 31 Jan 2022
Cited by 3 | Viewed by 914
Abstract
The COVID-19 pandemic created a global emergency in many sectors. The spread of the disease can be subdued through timely vaccination. The COVID-19 vaccination process in various countries is ongoing and is slowing down due to multiple factors. Many studies on European countries [...] Read more.
The COVID-19 pandemic created a global emergency in many sectors. The spread of the disease can be subdued through timely vaccination. The COVID-19 vaccination process in various countries is ongoing and is slowing down due to multiple factors. Many studies on European countries and the USA have been conducted and have highlighted the public’s concern that over-vaccination results in slowing the vaccination rate. Similarly, we analyzed a collection of data from the gulf countries’ citizens’ COVID-19 vaccine-related discourse shared on social media websites, mainly via Twitter. The people’s feedback regarding different types of vaccines needs to be considered to increase the vaccination process. In this paper, the concerns of Gulf countries’ people are highlighted to lessen the vaccine hesitancy. The proposed approach emphasizes the Gulf region-specific concerns related to COVID-19 vaccination accurately using machine learning (ML)-based methods. The collected data were filtered and tokenized to analyze the sentiments extracted using three different methods: Ratio, TextBlob, and VADER methods. The sentiment-scored data were classified into positive and negative tweeted data using a proposed LSTM method. Subsequently, to obtain more confidence in classification, the in-depth features from the proposed LSTM were extracted and given to four different ML classifiers. The ratio, TextBlob, and VADER sentiment scores were separately provided to LSTM and four machine learning classifiers. The VADER sentiment scores had the best classification results using fine-KNN and Ensemble boost with 94.01% classification accuracy. Given the improved accuracy, the proposed scheme is robust and confident in classifying and determining sentiments in Twitter discourse. Full article
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Tentative Title: Improving large scale k-nearest neighbor text categorization with label autoencoders
Possible Author: Manuel VILARES
Affiliation: Department of Computer Science, University of Vigo, Ourense 32004, Spain
Abstract: TBA

Tentative Title: Automatic Generation of Domain-Specific Sentiment Lexicon
Possible Author: Pavel BRAZDIL
Affiliation: Laboratory of Artificial Intelligence & Decision Support,INESC TEC, 4200 465 Porto, Portugal
Abstract: TBA

Tentative Title: Prior latent distribution comparison for the RNN variational autoencoder in low-resource modeling
Possible Author: Alexander Gelbukh
Affiliation: Natural Language and Text Processing Laboratory, Center for Computing Research, National Polytechnic Institute, 07738 Mexico City, Mexico
Abstract: TBA

Tentative Title: On the automatic decision about answering questions
Possible Author: Anselmo PEÑAS
Affiliation: Department of Computer Science, UNED, 28040 Madrid, Spain
Abstract: TBA

Tentative Title: Learning Contextualized Models from Dependency Trees
Possible Author: Pablo GAMALLO
Affiliation: Centro Singular de Investigación en Tecnoloxías Intelixentes (CITIUS), University of Santiago de Compostela, 15705 Santiago de Compostela, Spain
Abstract: TBA

Tentative Title: Variational fusion for multimodal sentiment analysis
 Possible Author: Alexander Gelbukh
Affiliation: Natural Language and Text Processing Laboratory, Center for Computing Research, National Polytechnic Institute, 07738 Mexico City, Mexico
Abstract: TBA

Back to TopTop