Next Article in Journal
Multiple Goal Linear Programming-Based Decision Preference Inconsistency Recognition and Adjustment Strategies
Next Article in Special Issue
Aggregation of Linked Data in the Cultural Heritage Domain: A Case Study in the Europeana Network
Previous Article in Journal
Analysis of Usability for the Dice CAPTCHA
Previous Article in Special Issue
Performance Comparing and Analysis for Slot Allocation Model
Open AccessArticle

Hadoop Performance Analysis Model with Deep Data Locality

Department of Computer Science, University of Wisconsin-Whitewater; Whitewater, WI 53190, USA
Department of Computer Science, University of Nevada, Las Vegas, NV 89154, USA
Author to whom correspondence should be addressed.
This paper is an extended version of our presentation in the 2018 IEEE International Conference on Big Data, Seattle, WA, USA, 10–13 December 2018.
Information 2019, 10(7), 222;
Received: 11 June 2019 / Revised: 23 June 2019 / Accepted: 26 June 2019 / Published: 27 June 2019
(This article belongs to the Special Issue Big Data Research, Development, and Applications––Big Data 2018)
PDF [5061 KB, uploaded 27 June 2019]


Background: Hadoop has become the base framework on the big data system via the simple concept that moving computation is cheaper than moving data. Hadoop increases a data locality in the Hadoop Distributed File System (HDFS) to improve the performance of the system. The network traffic among nodes in the big data system is reduced by increasing a data-local on the machine. Traditional research increased the data-local on one of the MapReduce stages to increase the Hadoop performance. However, there is currently no mathematical performance model for the data locality on the Hadoop. Methods: This study made the Hadoop performance analysis model with data locality for analyzing the entire process of MapReduce. In this paper, the data locality concept on the map stage and shuffle stage was explained. Also, this research showed how to apply the Hadoop performance analysis model to increase the performance of the Hadoop system by making the deep data locality. Results: This research proved the deep data locality for increasing performance of Hadoop via three tests, such as, a simulation base test, a cloud test and a physical test. According to the test, the authors improved the Hadoop system by over 34% by using the deep data locality. Conclusions: The deep data locality improved the Hadoop performance by reducing the data movement in HDFS. View Full-Text
Keywords: MapReduce; Hadoop; data locality; HDFS; deep data locality MapReduce; Hadoop; data locality; HDFS; deep data locality

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Lee, S.; Jo, J.-Y.; Kim, Y. Hadoop Performance Analysis Model with Deep Data Locality. Information 2019, 10, 222.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top