MAPSkew: Metaheuristic Approaches for Partitioning Skew in MapReduce
AbstractMapReduce is a parallel computing model in which a large dataset is split into smaller parts and executed on multiple machines. Due to its simplicity, MapReduce has been widely used in various applications domains. MapReduce can significantly reduce the processing time of a large amount of data by dividing the dataset into smaller parts and processing them in parallel in multiple machines. However, when data are not uniformly distributed, we have the so called partitioning skew, where the allocation of tasks to machines becomes unbalanced, either by the distribution function splitting the dataset unevenly or because a part of the data is more complex and requires greater computational effort. To solve this problem, we propose an approach based on metaheuristics. For evaluating purposes, three metaheuristics were implemented: Simulated Annealing, Local Beam Search and Stochastic Beam Search. Our experimental evaluation, using a MapReduce implementation of the Bron-Kerbosch Clique Algorithm, shows that the proposed method can find good partitionings while better balancing data among machines. View Full-Text
Share & Cite This Article
Pericini, M.H.M.; Leite, L.G.M.; De Carvalho-Junior, F.H.; Machado, J.C.; Rezende, C.A. MAPSkew: Metaheuristic Approaches for Partitioning Skew in MapReduce. Algorithms 2019, 12, 5.
Pericini MHM, Leite LGM, De Carvalho-Junior FH, Machado JC, Rezende CA. MAPSkew: Metaheuristic Approaches for Partitioning Skew in MapReduce. Algorithms. 2019; 12(1):5.Chicago/Turabian Style
Pericini, Matheus H.M.; Leite, Lucas G.M.; De Carvalho-Junior, Francisco H.; Machado, Javam C.; Rezende, Cenez A. 2019. "MAPSkew: Metaheuristic Approaches for Partitioning Skew in MapReduce." Algorithms 12, no. 1: 5.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.