Text Mining in Complex Domains

A special issue of Multimodal Technologies and Interaction (ISSN 2414-4088).

Deadline for manuscript submissions: closed (30 June 2019) | Viewed by 22372

Special Issue Editors


E-Mail Website
Guest Editor
Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
Interests: text mining; information retrieval; professional search; user aspects; evaluation

E-Mail Website
Guest Editor
Faculty of Arts, Radboud University, Nijmegen, The Netherlands
Interests: text mining; digital humanities; lexical and relational semantics; co-reference resolution; named entity recognition; automatic summarization

Special Issue Information

Dear Colleagues,

There is an abundance of text data available in a variety of domains. These data offer a large potential for knowledge discovery if the texts can be effectively disclosed with data mining techniques. However, text data is challenging for data mining because it is typically unstructured, often noisy, and open-ended – newly added documents bring new vocabulary and thus new features.

In developing Text Mining methods, every domain has its own unique challenges. Examples of complex text types that have gained attention of researchers in the past decade are: scientific publications, historic documents, patents, electronic health records, policy documents, and social media data. Text Mining research has its roots in the Natural Language Processing community as well as the Information Retrieval community, and receives attention from many application domains. We are seeking for more coherence in Text Mining research, by bringing together papers on text mining research from different angles.

This special issue invites submissions on the following topics:

 

  • Text mining methods, among which: named entity recognition, relation extraction, text categorization, text summarization, authorship detection, sentiment analysis
  • Text mining applications for complex domains
  • Domain adaptation for text mining methods
  • Evaluation of text mining methods
  • Pre-processing pipelines for text mining
  • Natural Language Processing for text mining
  • Information Retrieval for text mining
  • User interfacing for text mining
  • User studies addressing text mining applications
  • Methods for mining text with images

 

Dr. Suzan Verberne
Dr. Iris Hendrickx
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Multimodal Technologies and Interaction is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Text Mining
  • Natural Language Processing
  • Information Retrieval
  • Domain adaptation
  • Evaluation
  • Knowledge Discovery

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 2632 KiB  
Article
Reorganize Your Blogs: Supporting Blog Re-visitation with Natural Language Processing and Visualization
by Shuo Niu, D. Scott McCrickard, Timothy L. Stelter, Alan Dix and G. Don Taylor
Multimodal Technol. Interact. 2019, 3(4), 66; https://doi.org/10.3390/mti3040066 - 07 Oct 2019
Cited by 1 | Viewed by 4435
Abstract
Temporally-connected personal blogs contain voluminous textual content, presenting challenges in re-visiting and reflecting on experiences. Other data repositories have benefited from natural language processing (NLP) and interactive visualizations (VIS) to support exploration, but little is known about how these techniques could be used [...] Read more.
Temporally-connected personal blogs contain voluminous textual content, presenting challenges in re-visiting and reflecting on experiences. Other data repositories have benefited from natural language processing (NLP) and interactive visualizations (VIS) to support exploration, but little is known about how these techniques could be used with blogs to present experiences and support multimodal interaction with blogs, particularly for authors. This paper presents the effect of reorganization—reorganizing the large blog set with NLP and presenting abstract topics with VIS—to support novel re-visitation experiences to blogs. The BlogCloud tool, a blog re-visitation tool that reorganizes blog paragraphs around user-searched keywords, implements reorganization and similarity-based content grouping. Through a public use session with bloggers who wrote about extended hikes, we observed the effect of NLP-based reorganization in delivering novel re-visitation experiences. Findings suggest that the re-presented topics provide new reflection materials and re-visitation paths, enabling interaction with symbolic items in memory. Full article
(This article belongs to the Special Issue Text Mining in Complex Domains)
Show Figures

Figure 1

15 pages, 1533 KiB  
Article
Text Mining in Cybersecurity: Exploring Threats and Opportunities
by Maaike H. T. de Boer, Babette J. Bakker, Erik Boertjes, Mike Wilmer, Stephan Raaijmakers and Rick van der Kleij
Multimodal Technol. Interact. 2019, 3(3), 62; https://doi.org/10.3390/mti3030062 - 15 Sep 2019
Cited by 8 | Viewed by 7250
Abstract
The number of cyberattacks on organizations is growing. To increase cyber resilience, organizations need to obtain foresight to anticipate cybersecurity vulnerabilities, developments, and potential threats. This paper describes a tool that combines state of the art text mining and information retrieval techniques to [...] Read more.
The number of cyberattacks on organizations is growing. To increase cyber resilience, organizations need to obtain foresight to anticipate cybersecurity vulnerabilities, developments, and potential threats. This paper describes a tool that combines state of the art text mining and information retrieval techniques to explore the opportunities of using these techniques in the cybersecurity domain. Our tool, the Horizon Scanner, can scrape and store data from websites, blogs and PDF articles, and search a database based on a user query, show textual entities in a graph, and provide and visualize potential trends. The aim of the Horizon Scanner is to help experts explore relevant data sources for potential threats and trends and to speed up the process of foresight. In a requirements session and user evaluation of the tool with cyber experts from the Dutch Defense Cyber Command, we explored whether the Horizon Scanner tool has the potential to fulfill its aim in the cybersecurity domain. Although the overall evaluation of the tool was not as good as expected, some aspects of the tool were found to have added value, providing us with valuable insights into how to design decision support for forecasting analysts. Full article
(This article belongs to the Special Issue Text Mining in Complex Domains)
Show Figures

Graphical abstract

27 pages, 792 KiB  
Article
Data-Driven Lexical Normalization for Medical Social Media
by Anne Dirkson, Suzan Verberne, Abeed Sarker and Wessel Kraaij
Multimodal Technol. Interact. 2019, 3(3), 60; https://doi.org/10.3390/mti3030060 - 20 Aug 2019
Cited by 8 | Viewed by 4154
Abstract
In the medical domain, user-generated social media text is increasingly used as a valuable
complementary knowledge source to scientific medical literature. The extraction of this knowledge is
complicated by colloquial language use and misspellings. However, lexical normalization of such
data has not been [...] Read more.
In the medical domain, user-generated social media text is increasingly used as a valuable
complementary knowledge source to scientific medical literature. The extraction of this knowledge is
complicated by colloquial language use and misspellings. However, lexical normalization of such
data has not been addressed effectively. This paper presents a data-driven lexical normalization
pipeline with a novel spelling correction module for medical social media. Our method significantly
outperforms state-of-the-art spelling correction methods and can detect mistakes with an F1 of 0.63
despite extreme imbalance in the data. We also present the first corpus for spelling mistake detection
and correction in a medical patient forum. Full article
(This article belongs to the Special Issue Text Mining in Complex Domains)
Show Figures

Figure 1

12 pages, 1099 KiB  
Article
Unsupervised Keyphrase Extraction for Web Pages
by Tim Haarman, Bastiaan Zijlema and Marco Wiering
Multimodal Technol. Interact. 2019, 3(3), 58; https://doi.org/10.3390/mti3030058 - 31 Jul 2019
Cited by 4 | Viewed by 5819
Abstract
Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely [...] Read more.
Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely untouched in scientific research. Current research is often only applied to clean corpora such as abstracts and articles from academic journals or sets of scraped texts from a single domain. However, textual data from web pages differ from normal text documents, as it is structured using HTML elements and often consists of many small fragments. These elements are furthermore used in a highly inconsistent manner and are likely to contain noise. We evaluated the keyphrases extracted by several state-of-the-art extraction methods and found that they did not transfer well to web pages. We therefore propose WebEmbedRank, an adaptation of a recently proposed extraction method that can make use of structural information in web pages in a robust manner. We compared this novel method to other baselines and state-of-the-art methods using a manually annotated dataset and found that WebEmbedRank achieved significant improvements over existing extraction methods on web pages. Full article
(This article belongs to the Special Issue Text Mining in Complex Domains)
Show Figures

Figure 1

Back to TopTop