Next Article in Journal
Idea of Using Blockchain Technique for Choosing the Best Configuration of Weights in Neural Networks
Previous Article in Journal
Distributed Centrality Analysis of Social Network Data Using MapReduce
Open AccessArticle

Distributed Balanced Partitioning via Linear Embedding

Google Research, 76 Ninth Ave, New York, NY 10011, USA
*
Author to whom correspondence should be addressed.
This article is an extended version of our paper published in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, 22–25 February 2016.
Authors contributed equally to this work.
Algorithms 2019, 12(8), 162; https://doi.org/10.3390/a12080162
Received: 18 July 2019 / Revised: 5 August 2019 / Accepted: 7 August 2019 / Published: 10 August 2019
(This article belongs to the Special Issue Graph Partitioning: Theory, Engineering, and Applications)
  |  
PDF [591 KB, uploaded 10 August 2019]
  |  

Abstract

Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems, for example, in some cases, a big graph can be chopped into pieces that fit on one machine to be processed independently before stitching the results together, leading to certain suboptimality from the interaction among different pieces. In other cases, links between different parts may show up in the running time and/or network communications cost, hence the desire to have small cut size. We study a distributed balanced-partitioning problem where the goal is to partition the vertices of a given graph into k pieces so as to minimize the total cut size. Our algorithm is composed of a few steps that are easily implementable in distributed computation frameworks such as MapReduce. The algorithm first embeds nodes of the graph onto a line, and then processes nodes in a distributed manner guided by the linear embedding order. We examine various ways to find the first embedding, for example, via a hierarchical clustering or Hilbert curves. Then we apply four different techniques including local swaps, and minimum cuts on the boundaries of partitions, as well as contraction and dynamic programming. As our empirical study, we compare the above techniques with each other, and also to previous work in distributed graph algorithms, for example, a label-propagation method, FENNEL and Spinner. We report our results both on a private map graph and several public social networks, and show that our results beat previous distributed algorithms: For instance, compared to the label-propagation algorithm, we report an improvement of 15–25% in the cut value. We also observe that our algorithms admit scalable distributed implementation for any number of partitions. Finally, we explain three applications of this work at Google: (1) Balanced partitioning is used to route multi-term queries to different replicas in Google Search backend in a way that reduces the cache miss rates by ≈ 0.5 % , which leads to a double-digit gain in throughput of production clusters. (2) Applied to the Google Maps Driving Directions, balanced partitioning minimizes the number of cross-shard queries with the goal of saving in CPU usage. This system achieves load balancing by dividing the world graph into several “shards”. Live experiments demonstrate an ≈ 40 % drop in the number of cross-shard queries when compared to a standard geography-based method. (3) In a job scheduling problem for our data centers, we use balanced partitioning to evenly distribute the work while minimizing the amount of communication across geographically distant servers. In fact, the hierarchical nature of our solution goes well with the layering of data center servers, where certain machines are closer to each other and have faster links to one another. View Full-Text
Keywords: cut minimization; embedding to line; imbalance; local improvement; MapReduce; maps; partitioning; social networks cut minimization; embedding to line; imbalance; local improvement; MapReduce; maps; partitioning; social networks
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Aydin, K.; Bateni, M.; Mirrokni, V. Distributed Balanced Partitioning via Linear Embedding . Algorithms 2019, 12, 162.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Algorithms EISSN 1999-4893 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top