Special Issue "Natural Language Processing and Text Mining"

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Systems".

Deadline for manuscript submissions: 30 April 2019

Special Issue Editors

Guest Editor
Dr. Pablo Gamallo

CiTIUS (Centro Singular de Investigación en Tecnoloxías da Información), University of Santiago de Compostela, Santiago de Compostela, Spain
Website | E-Mail
Interests: natural language processing; distributional semantics; information extraction; dependency parsing
Guest Editor
Dr. Marcos Garcia

LyS (Language and Information Society) Group, University of A Coruna, A Coruna, Spain
Website | E-Mail
Interests: natural language processing; computational linguistics; linguistics

Special Issue Information

Dear Colleagues,

Natural language processing (NLP) encompasses a set of linguistically motivated strategies focused on building an interpretable representation from free text. NLP typically makes use of linguistic tasks such as lemmatization, PoS tagging, syntactic analysis, anaphora resolution, semantic role labeling, and so on. Text mining, on the other hand, is a set of Text2Data techniques for discovering and extracting relevant and salient knowledge from large amounts of unstructured text. Its main objective typically is not to understand all or even a large part of what a given speaker/writer has uttered, but rather to extract items of knowledge or regular patterns across a large number of documents, especially Web content and social media.  Following recent advances in NLP, machine learning, neural-based deep learning and big data, text mining is now an even more valuable method for connecting linguistic theories with real-world NLP applications aimed at building organized data from unstructured text. Both hidden and new knowledge can be discovered by making use of NLP techniques and text mining methods, by relying on supervised or unsupervised learning strategies within big data environments.

Authors are invited to submit their papers on any of the following topics (or other related topics):

  • relation extraction (including approaches to open information extraction)
  • named entity recognition
  • entity linking
  • analysis of opinions, emotions and sentiments
  • text clustering, topic modelling, and classification
  • summarization and text simplification
  • co-reference resolution
  • distributional models and semantics
  • multiword and/or terminological extraction
  • entailment and paraphrases
  • discourse analysis
  • question-answering applications

Particular emphasis will be placed to work that makes use of new technologies and innovative linguistic models, or carrying out studies with cross-lingual approaches and minority languages.

Dr. Pablo Gamallo
Dr. Marcos Garcia
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.


  • Natural Language Processing
  • Text Mining
  • Information Extraction
  • Mining Web and Social Media Contents
  • Text2Data
  • Language Technologies

Published Papers

This special issue is now open for submission, see below for planned papers.

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: Multilingual Open Information Extraction: Challenges and Opportunities
Authors: Daniela Barreiro Claro1,* , Marlo Vieira dos Santos e Souza1, and Clarissa Castelã Xavier2
1 FORMAS Research Group, Computer Science Department, Federal University of Bahia, Brazil
2 FORMAS Research Group, Federal University of Rio Grande do Sul (UFRGS), Brazil
Abstract: The number of documents published on theWeb other languages than English grows every year. As a consequence, it increases the necessity of extracting useful information from different languages, pointing out the importance of researching Open Information Extraction (OIE) techniques. Different OIE methods have been dealing with features from a unique language. On the other hand, few approaches tackle multilingual aspects. In such approaches, multilingual is only treated as an extraction method, which results in low precision due the use of general rules. Multilingual methods have been applied to a vast amount of problems in Natural Language Processing achieving satisfactory results and demonstrating that knowledge acquisition for a language can be transferred to other languages to improve the quality of the facts extracted. We state that a multilingual approach can enhance OIE methods, being ideal to evaluate and compare OIE systems, and as a consequence, to applying it to the collected facts . In this work, we discuss how to learn patterns from multilingual approaches through knowledge transfer methods. We evaluate how such patterns can be used to improve the quality of the facts extracted in each language. Moreover, we discuss the importance of a parallel corpus to evaluate and compare multilingual systems.
Keywords: Multilingual; Open Information Extraction; Parallel Corpus

Title: Large Scale Linguistic and Topic Model Analysis of Basque tweets to characterize social interactions of Twitter communities
Authors: Joseba Fernández de Landa, Rodrigo Agerri, Iñaki Alegria
Affiliation: IXA NLP Group, University of the Basque Country UPV/EHU, Donostia-San Sebastián, Spain
Abstract: Social networks like Twitter are taking more and more importance nowadays, creating new ways of communication between humans. They are also a useful tools for social and linguistic research, because there is a lot of public data available based on social interaction using such networks. The availability of this data is particularly important for a less resourced language such as Basque, because it allows us to apply current Natural Language Processing techniques to large amounts of unstructured data in order to analyze both the linguistic and social behavior of users based on the text they generate. In this work, the aim will be to study the linguistic and social aspects of young and adult people’s behavior based on the tweets they generate.
With this aim in mind, we have gathered over 6 million tweets written in Basque from more than 8000 users. First, we extracted and analyzed the crawled data to find the most popular topics. Then we classified each user’s tweet by age (young/adult) by establishing whether the writing style of each tweet is formal/informal. Several classification and topic modeling methods, both supervised and unsupervised, have been applied, offering each of them different insights with respect to the feasibility of the task with the available data. Second, we   establish the relations that emerge between the users based on their retweets, characterizing also the various communities that arise from the analysis of the data.

Title: Event extraction and representation from online news: a case study for the Portuguese language
Authors: Paulo Quaresma
Affiliation: 1 Research Center on Information Technologies (CiTIUS), University of Santiago de
Compostela. Spain
2 Institute of Heritage Sciences (Incipit), Spanish National Research Council (CSIC). Spain.
Abstract: Text information extraction is an important NLP task, which aims to automatically identify, extract and represent information in texts. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge reasoning and inference processes. Moreover, this process allows the creation of a timeline, which is a simple and intuitive visualization mechanism for events.
In this work we will describe in detail our proposal for event extraction from textual online news for the Portuguese language. The proposed approach is based on a pipeline of specialized natural language processing tools, namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, a semantic role labeller, and a knowledge extraction module. The architecture is language independent, but its modules are language dependent and can be built using adequate AI methodologies (rule based or machine learning ones). The developed system was evaluated with a corpus of online Portuguese news and the obtained results are presented and analyzed. Current limitations and future work are discussed in detail.

Title: Assisting forensic identification through unsupervised information extraction of free text
autopsy reports: the disappearances cases during the Brazilian military dictatorship.
Authors: Patricia Martin-Rodilla1, *, Marcia L. Hattori 2, and Cesar Gonzalez-Perez2
Affiliation: 1 Research Center on Information Technologies (CiTIUS), University of Santiago de
Compostela. Spain
2 Institute of Heritage Sciences (Incipit), Spanish National Research Council (CSIC). Spain.
Abstract: Anthropological, archaeological and medical studies situate enforced disappearance
as one of the strategies associated with the Brazilian military dictatorship (1964-1985), leaving
hundreds of disappeared persons whose identity and cause of death, to this day, is unknown.
The forensic and police reports of the cases of disappearance are the only clue that the
investigators have for the identification of these persons, and also for the detection of possible
crimes associated with the cases. Their analysis requires unsupervised techniques (since the
contextual annotation of these large reports is extremely time-consuming, subject to
subjectivity and difficult to obtain) that allow researchers to assist in the identification and
analysis in four directions: common causes of death, relevant bodies locations, personal
belongings terminology and correlations between actors (for example, doctors, police officers
or other actors involved in the disappearances).
This paper analyses almost 3000 textual reports of cases of missing persons in the city of Sao
Paulo during the Brazilian military dictatorship through unsupervised algorithms of information
extraction in Portuguese, identifying named entities and relevant terminology associated with
these four criteria. The analysis allowed us to observe terminological patterns presented in the
reports that are relevant for people identification (e.g. presence of rings or similar personal
belongings with high relevance for the identification) and automatize the study of correlations
between actors.
The proposed system acts as a first classificatory and index middleware of the reports based on
these criteria and represents a feasible system that assists researchers in the pattern search
among autopsy reports.

Keywords: Information Extraction; named entity recognition, terminology extraction, autopsy reports.





Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top