Next Article in Journal
Melatonin Pharmacokinetics Following Oral Administration in Preterm Neonates
Next Article in Special Issue
A Robust Manifold Graph Regularized Nonnegative Matrix Factorization Algorithm for Cancer Gene Clustering
Previous Article in Journal
Multi Component Reactions under Increased Pressure: On the Mechanism of Formation of Pyridazino[5,4,3-de][1,6]naphthyridine Derivatives by the Reaction of Malononitrile, Aldehydes and 2-Oxoglyoxalarylhydrazones in Q-Tubes
Previous Article in Special Issue
Cancer Classification Based on Support Vector Machine Optimized by Particle Swarm Optimization and Artificial Bee Colony
Article Menu
Issue 12 (December) cover image

Export Article

Open AccessArticle
Molecules 2017, 22(12), 2116;

An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer

College of Computer, National University of Defense Technology, Changsha 410073, China
Beijing Genomics Institute (BGI) Shenzhen, Shenzhen 518083, China
National Supercomputing Center of Guangzhou, Guangzhou 510006, China
School of Data and Computer Science, Sun Yat-Sen University, Guangzhou 510000, China
Authors to whom correspondence should be addressed.
Received: 25 October 2017 / Accepted: 29 November 2017 / Published: 1 December 2017
Full-Text   |   PDF [1825 KB, uploaded 4 December 2017]   |  


Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion—a big data interface on the Tianhe-2 supercomputer—to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the “allocate-when-needed” paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2. View Full-Text
Keywords: big data; Tianhe-2; Hadoop; Spark; genomics big data big data; Tianhe-2; Hadoop; Spark; genomics big data

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Yang, X.; Wu, C.; Lu, K.; Fang, L.; Zhang, Y.; Li, S.; Guo, G.; Du, Y. An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer. Molecules 2017, 22, 2116.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Molecules EISSN 1420-3049 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top