Next Article in Journal
An Agent-Based Approach to Interbank Market Lending Decisions and Risk Implications
Next Article in Special Issue
Multiple Congestion Points and Congestion Reaction Mechanisms for Improving DCTCP Performance in Data Center Networks
Previous Article in Journal
An Interactive Multiobjective Optimization Approach to Supplier Selection and Order Allocation Problems Using the Concept of Desirability
Previous Article in Special Issue
Hybrid Visualization Approach to Show Documents Similarity and Content in a Single View
Open AccessArticle

Hadoop Cluster Deployment: A Methodological Approach

1
Departamento de Matematica e Computação, Sao Paulo State University—UNESP, Presidente Prudente 19060-900, Brazil
2
Instituto de Ciencias Matematicas e Computacao, University of Sao Paulo—USP, Sao Carlos 13566-590, Brazil
*
Author to whom correspondence should be addressed.
Information 2018, 9(6), 131; https://doi.org/10.3390/info9060131
Received: 27 February 2018 / Revised: 24 May 2018 / Accepted: 25 May 2018 / Published: 29 May 2018
(This article belongs to the Special Issue Information Technology: New Generations (ITNG 2017))
For a long time, data has been treated as a general problem because it just represents fractions of an event without any relevant purpose. However, the last decade has been just about information and how to get it. Seeking meaning in data and trying to solve scalability problems, many frameworks have been developed to improve data storage and its analysis. As a framework, Hadoop was presented as a powerful tool to deal with large amounts of data. However, it still causes doubts about how to deal with its deployment and if there is any reliable method to compare the performance of distinct Hadoop clusters. This paper presents a methodology based on benchmark analysis to guide the Hadoop cluster deployment. The experiments employed The Apache Hadoop and the Hadoop distributions of Cloudera, Hortonworks, and MapR, analyzing the architectures on local and on clouding—using centralized and geographically distributed servers. The results show the methodology can be dynamically applied on a reliable comparison among different architectures. Additionally, the study suggests that the knowledge acquired can be used to improve the data analysis process by understanding the Hadoop architecture. View Full-Text
Keywords: benchmark methodology; Hadoop; Big Data; computational models benchmark methodology; Hadoop; Big Data; computational models
Show Figures

Figure 1

MDPI and ACS Style

Correia, R.C.M.; Spadon, G.; De Andrade Gomes, P.H.; Eler, D.M.; Garcia, R.E.; Olivete Junior, C. Hadoop Cluster Deployment: A Methodological Approach. Information 2018, 9, 131.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map

1
Back to TopTop