Special Issue "Text Mining in Complex Domains"

A special issue of Multimodal Technologies and Interaction (ISSN 2414-4088).

Deadline for manuscript submissions: closed (30 June 2019).

Special Issue Editors

Dr. Suzan Verberne
Website
Guest Editor
Leiden University, Leiden, The Netherlands
Interests: Text Mining; Information Retrieval; Professional Search; User Aspects; Evaluation
Dr. Iris Hendrickx
Website
Guest Editor
Radboud University, Nijmegen, The Netherlands
Interests: Text Mining; Digital Humanities; Lexical and Relational Semantics; Co-reference Resolution; Named Entity recognition; Automatic Summarization

Special Issue Information

Dear Colleagues,

There is an abundance of text data available in a variety of domains. These data offer a large potential for knowledge discovery if the texts can be effectively disclosed with data mining techniques. However, text data is challenging for data mining because it is typically unstructured, often noisy, and open-ended – newly added documents bring new vocabulary and thus new features.

In developing Text Mining methods, every domain has its own unique challenges. Examples of complex text types that have gained attention of researchers in the past decade are: scientific publications, historic documents, patents, electronic health records, policy documents, and social media data. Text Mining research has its roots in the Natural Language Processing community as well as the Information Retrieval community, and receives attention from many application domains. We are seeking for more coherence in Text Mining research, by bringing together papers on text mining research from different angles.

This special issue invites submissions on the following topics:

 

  • Text mining methods, among which: named entity recognition, relation extraction, text categorization, text summarization, authorship detection, sentiment analysis
  • Text mining applications for complex domains
  • Domain adaptation for text mining methods
  • Evaluation of text mining methods
  • Pre-processing pipelines for text mining
  • Natural Language Processing for text mining
  • Information Retrieval for text mining
  • User interfacing for text mining
  • User studies addressing text mining applications
  • Methods for mining text with images

 

Dr. Suzan Verberne
Dr. Iris Hendrickx
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Multimodal Technologies and Interaction is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Text Mining
  • Natural Language Processing
  • Information Retrieval
  • Domain adaptation
  • Evaluation
  • Knowledge Discovery

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Open AccessArticle
Reorganize Your Blogs: Supporting Blog Re-visitation with Natural Language Processing and Visualization
Multimodal Technol. Interact. 2019, 3(4), 66; https://doi.org/10.3390/mti3040066 - 07 Oct 2019
Viewed by 1134
Abstract
Temporally-connected personal blogs contain voluminous textual content, presenting challenges in re-visiting and reflecting on experiences. Other data repositories have benefited from natural language processing (NLP) and interactive visualizations (VIS) to support exploration, but little is known about how these techniques could be used [...] Read more.
Temporally-connected personal blogs contain voluminous textual content, presenting challenges in re-visiting and reflecting on experiences. Other data repositories have benefited from natural language processing (NLP) and interactive visualizations (VIS) to support exploration, but little is known about how these techniques could be used with blogs to present experiences and support multimodal interaction with blogs, particularly for authors. This paper presents the effect of reorganization—reorganizing the large blog set with NLP and presenting abstract topics with VIS—to support novel re-visitation experiences to blogs. The BlogCloud tool, a blog re-visitation tool that reorganizes blog paragraphs around user-searched keywords, implements reorganization and similarity-based content grouping. Through a public use session with bloggers who wrote about extended hikes, we observed the effect of NLP-based reorganization in delivering novel re-visitation experiences. Findings suggest that the re-presented topics provide new reflection materials and re-visitation paths, enabling interaction with symbolic items in memory. Full article
(This article belongs to the Special Issue Text Mining in Complex Domains)
Show Figures

Figure 1

Open AccessArticle
Text Mining in Cybersecurity: Exploring Threats and Opportunities
Multimodal Technol. Interact. 2019, 3(3), 62; https://doi.org/10.3390/mti3030062 - 15 Sep 2019
Cited by 2 | Viewed by 1557
Abstract
The number of cyberattacks on organizations is growing. To increase cyber resilience, organizations need to obtain foresight to anticipate cybersecurity vulnerabilities, developments, and potential threats. This paper describes a tool that combines state of the art text mining and information retrieval techniques to [...] Read more.
The number of cyberattacks on organizations is growing. To increase cyber resilience, organizations need to obtain foresight to anticipate cybersecurity vulnerabilities, developments, and potential threats. This paper describes a tool that combines state of the art text mining and information retrieval techniques to explore the opportunities of using these techniques in the cybersecurity domain. Our tool, the Horizon Scanner, can scrape and store data from websites, blogs and PDF articles, and search a database based on a user query, show textual entities in a graph, and provide and visualize potential trends. The aim of the Horizon Scanner is to help experts explore relevant data sources for potential threats and trends and to speed up the process of foresight. In a requirements session and user evaluation of the tool with cyber experts from the Dutch Defense Cyber Command, we explored whether the Horizon Scanner tool has the potential to fulfill its aim in the cybersecurity domain. Although the overall evaluation of the tool was not as good as expected, some aspects of the tool were found to have added value, providing us with valuable insights into how to design decision support for forecasting analysts. Full article
(This article belongs to the Special Issue Text Mining in Complex Domains)
Show Figures

Graphical abstract

Open AccessArticle
Data-Driven Lexical Normalization for Medical Social Media
Multimodal Technol. Interact. 2019, 3(3), 60; https://doi.org/10.3390/mti3030060 - 20 Aug 2019
Cited by 4 | Viewed by 1221
Abstract
In the medical domain, user-generated social media text is increasingly used as a valuable
complementary knowledge source to scientific medical literature. The extraction of this knowledge is
complicated by colloquial language use and misspellings. However, lexical normalization of such
data has not been [...] Read more.
In the medical domain, user-generated social media text is increasingly used as a valuable
complementary knowledge source to scientific medical literature. The extraction of this knowledge is
complicated by colloquial language use and misspellings. However, lexical normalization of such
data has not been addressed effectively. This paper presents a data-driven lexical normalization
pipeline with a novel spelling correction module for medical social media. Our method significantly
outperforms state-of-the-art spelling correction methods and can detect mistakes with an F1 of 0.63
despite extreme imbalance in the data. We also present the first corpus for spelling mistake detection
and correction in a medical patient forum. Full article
(This article belongs to the Special Issue Text Mining in Complex Domains)
Show Figures

Figure 1

Open AccessArticle
Unsupervised Keyphrase Extraction for Web Pages
Multimodal Technol. Interact. 2019, 3(3), 58; https://doi.org/10.3390/mti3030058 - 31 Jul 2019
Cited by 1 | Viewed by 1226
Abstract
Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely [...] Read more.
Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely untouched in scientific research. Current research is often only applied to clean corpora such as abstracts and articles from academic journals or sets of scraped texts from a single domain. However, textual data from web pages differ from normal text documents, as it is structured using HTML elements and often consists of many small fragments. These elements are furthermore used in a highly inconsistent manner and are likely to contain noise. We evaluated the keyphrases extracted by several state-of-the-art extraction methods and found that they did not transfer well to web pages. We therefore propose WebEmbedRank, an adaptation of a recently proposed extraction method that can make use of structural information in web pages in a robust manner. We compared this novel method to other baselines and state-of-the-art methods using a manually annotated dataset and found that WebEmbedRank achieved significant improvements over existing extraction methods on web pages. Full article
(This article belongs to the Special Issue Text Mining in Complex Domains)
Show Figures

Figure 1

Back to TopTop