Next Article in Journal
Efficient Removal of Lead Ions from Water by a Low-Cost Alginate-Melamine Hybrid Sorbent
Next Article in Special Issue
An Online Simultaneous Measurement of the Dual-Axis Straightness Error for Machine Tools
Previous Article in Journal
Optimization of the Antioxidant Potentials of Red Pitaya Peels and Its In Vitro Skin Whitening Properties
Article Menu
Issue 9 (September) cover image

Export Article

Open AccessArticle
Appl. Sci. 2018, 8(9), 1514; https://doi.org/10.3390/app8091514

Integrated High-Performance Platform for Fast Query Response in Big Data with Hive, Impala, and SparkSQL: A Performance Evaluation

1
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung 81148, Taiwan
2
Department of Fragrance and Cosmetic Science, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
*
Author to whom correspondence should be addressed.
Received: 6 August 2018 / Revised: 24 August 2018 / Accepted: 26 August 2018 / Published: 1 September 2018
(This article belongs to the Special Issue Selected Papers from IEEE ICKII 2018)

Abstract

This paper first integrates big data tools—Hive, Impala, and SparkSQL—which support SQL-like queries for rapid data retrieval in big data. The three introduced tools are not only suitable for operating in business intelligence to serve high-performance data retrieval, but they are also an open-source software solution with low cost for small-to-medium enterprise use. In practice, the proposed approach provides an in-memory cache and an in-disk cache to achieve a very fast response to a query if a cache hit occurs. Moreover, this paper develops so-called platform selection that is able to select the appropriate tool dealing with input query with effectiveness and efficiency. As a result, the speed of job execution of proposed approach using platform selection is 2.63 times faster than Hive in the Case 1 experiment, and 4.57 times faster in the Case 2 experiment. View Full-Text
Keywords: big data tool; SQL-like query; in-memory cache; tool selection; performance index big data tool; SQL-like query; in-memory cache; tool selection; performance index
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Chang, B.R.; Tsai, H.-F.; Lee, Y.-D. Integrated High-Performance Platform for Fast Query Response in Big Data with Hive, Impala, and SparkSQL: A Performance Evaluation. Appl. Sci. 2018, 8, 1514.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top