Next Article in Journal
Multi-Response Optimization of Ultrasonic Assisted Enzymatic Extraction Followed by Macroporous Resin Purification for Maximal Recovery of Flavonoids and Ginkgolides from Waste Ginkgo biloba Fallen Leaves
Next Article in Special Issue
Small Universal Bacteria and Plasmid Computing Systems
Previous Article in Journal
Profiling of Heterobranchia Sea Slugs from Portuguese Coastal Waters as Producers of Anti-Cancer and Anti-Inflammatory Agents
Previous Article in Special Issue
To Decipher the Mycoplasma hominis Proteins Targeting into the Endoplasmic Reticulum and Their Implications in Prostate Cancer Etiology Using Next-Generation Sequencing Data
Article Menu
Issue 5 (May) cover image

Export Article

Open AccessArticle
Molecules 2018, 23(5), 1028; https://doi.org/10.3390/molecules23051028

ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers

1
School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China
2
School of Computer Science and Network Security, Dongguan University of Technology, Dongguan, Guangdong 523808, China
*
Author to whom correspondence should be addressed.
Received: 9 April 2018 / Revised: 22 April 2018 / Accepted: 25 April 2018 / Published: 27 April 2018
(This article belongs to the Special Issue Molecular Computing and Bioinformatics)
Full-Text   |   PDF [4270 KB, uploaded 3 May 2018]   |  

Abstract

A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER. View Full-Text
Keywords: biomedical text mining; big data; Tianhe-2; parallel computing; load balancing biomedical text mining; big data; Tianhe-2; parallel computing; load balancing
Figures

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Xing, Y.; Wu, C.; Yang, X.; Wang, W.; Zhu, E.; Yin, J. ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers. Molecules 2018, 23, 1028.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Molecules EISSN 1420-3049 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top