One of the enduring issues of spatial origin-destination (OD) flow data analysis is the computational inefficiency or even the impossibility to handle large datasets. Despite the recent advancements in high performance computing (HPC) and the ready availability of powerful computing infrastructure, we argue that the best solutions are based on a thorough understanding of the fundamental properties of the data. This paper focuses on overcoming the computational challenge through data reduction that intelligently takes advantage of the heavy-tailed distributional property of most flow datasets. We specifically propose the classification technique of head/tail breaks to this end. We test this approach with representative algorithms from three common method families, namely flowAMOEBA from flow clustering, Louvain from network community detection, and PageRank from network centrality algorithms. A variety of flow datasets are adopted for the experiments, including inter-city travel flows, cellphone call flows, and synthetic flows. We propose a standard evaluation framework to evaluate the applicability of not only the selected three algorithms, but any given method in a systematic way. The results prove that head/tail breaks can significantly improve the computational capability and efficiency of flow data analyses while preserving result quality, on condition that the analysis emphasizes the “head” part of the dataset or the flows with high absolute values. We recommend considering this easy-to-implement data reduction technique before analyzing a large flow dataset.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited