Special Issue "Text Mining in Complex Domains"

A special issue of Multimodal Technologies and Interaction (ISSN 2414-4088).

Deadline for manuscript submissions: closed (30 June 2019).

Special Issue Editors

Guest Editor
Dr. Suzan Verberne

Leiden University, Leiden, The Netherlands
Website | E-Mail
Interests: Text Mining; Information Retrieval; Professional Search; User Aspects; Evaluation
Guest Editor
Dr. Iris Hendrickx

Radboud University, Nijmegen, The Netherlands
Website | E-Mail
Interests: Text Mining; Digital Humanities; Lexical and Relational Semantics; Co-reference Resolution; Named Entity recognition; Automatic Summarization

Special Issue Information

Dear Colleagues,

There is an abundance of text data available in a variety of domains. These data offer a large potential for knowledge discovery if the texts can be effectively disclosed with data mining techniques. However, text data is challenging for data mining because it is typically unstructured, often noisy, and open-ended – newly added documents bring new vocabulary and thus new features.

In developing Text Mining methods, every domain has its own unique challenges. Examples of complex text types that have gained attention of researchers in the past decade are: scientific publications, historic documents, patents, electronic health records, policy documents, and social media data. Text Mining research has its roots in the Natural Language Processing community as well as the Information Retrieval community, and receives attention from many application domains. We are seeking for more coherence in Text Mining research, by bringing together papers on text mining research from different angles.

This special issue invites submissions on the following topics:

 

  • Text mining methods, among which: named entity recognition, relation extraction, text categorization, text summarization, authorship detection, sentiment analysis
  • Text mining applications for complex domains
  • Domain adaptation for text mining methods
  • Evaluation of text mining methods
  • Pre-processing pipelines for text mining
  • Natural Language Processing for text mining
  • Information Retrieval for text mining
  • User interfacing for text mining
  • User studies addressing text mining applications
  • Methods for mining text with images

 

Dr. Suzan Verberne
Dr. Iris Hendrickx
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Multimodal Technologies and Interaction is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Text Mining
  • Natural Language Processing
  • Information Retrieval
  • Domain adaptation
  • Evaluation
  • Knowledge Discovery

Published Papers (2 papers)

View options order results:
result details:
Displaying articles 1-2
Export citation of selected articles as:

Research

Open AccessArticle
Data-Driven Lexical Normalization for Medical Social Media
Multimodal Technologies Interact. 2019, 3(3), 60; https://doi.org/10.3390/mti3030060 (registering DOI)
Received: 30 June 2019 / Revised: 9 August 2019 / Accepted: 13 August 2019 / Published: 20 August 2019
PDF Full-text (710 KB)
Abstract
In the medical domain, user-generated social media text is increasingly used as a valuable
complementary knowledge source to scientific medical literature. The extraction of this knowledge is
complicated by colloquial language use and misspellings. However, lexical normalization of such
data has not been [...] Read more.
In the medical domain, user-generated social media text is increasingly used as a valuable
complementary knowledge source to scientific medical literature. The extraction of this knowledge is
complicated by colloquial language use and misspellings. However, lexical normalization of such
data has not been addressed effectively. This paper presents a data-driven lexical normalization
pipeline with a novel spelling correction module for medical social media. Our method significantly
outperforms state-of-the-art spelling correction methods and can detect mistakes with an F1 of 0.63
despite extreme imbalance in the data. We also present the first corpus for spelling mistake detection
and correction in a medical patient forum. Full article
(This article belongs to the Special Issue Text Mining in Complex Domains)
Open AccessArticle
Unsupervised Keyphrase Extraction for Web Pages
Multimodal Technologies Interact. 2019, 3(3), 58; https://doi.org/10.3390/mti3030058
Received: 28 June 2019 / Revised: 15 July 2019 / Accepted: 26 July 2019 / Published: 31 July 2019
PDF Full-text (1099 KB) | HTML Full-text | XML Full-text
Abstract
Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely [...] Read more.
Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely untouched in scientific research. Current research is often only applied to clean corpora such as abstracts and articles from academic journals or sets of scraped texts from a single domain. However, textual data from web pages differ from normal text documents, as it is structured using HTML elements and often consists of many small fragments. These elements are furthermore used in a highly inconsistent manner and are likely to contain noise. We evaluated the keyphrases extracted by several state-of-the-art extraction methods and found that they did not transfer well to web pages. We therefore propose WebEmbedRank, an adaptation of a recently proposed extraction method that can make use of structural information in web pages in a robust manner. We compared this novel method to other baselines and state-of-the-art methods using a manually annotated dataset and found that WebEmbedRank achieved significant improvements over existing extraction methods on web pages. Full article
(This article belongs to the Special Issue Text Mining in Complex Domains)
Figures

Figure 1

Multimodal Technologies Interact. EISSN 2414-4088 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top