Interactive and Online Buffer-Overlay Analytics of Large-Scale Spatial Data

: Buffer and overlay analysis are fundamental operations which are widely used in Geographic Information Systems (GIS) for resource allocation, land planning, and other relevant ﬁelds. Real-time buffer and overlay analysis for large-scale spatial data remains a challenging problem because the computational scales of conventional data-oriented methods expand rapidly with data volumes. In this paper, we present HiBO, a visualization-oriented buffer-overlay analysis model which is less sensitive to data volumes. In HiBO, the core task is to determine the value of pixels for display. Therefore, we introduce an efﬁcient spatial-index-based buffer generation method and an effective set-transformation-based overlay optimization method. Moreover, we propose a fully optimized hybrid-parallel processing architecture to ensure the real-time capability of HiBO. Experiments on real-world datasets show that our approach is capable of handling ten-million-scale spatial data in real time. An online demonstration of HiBO is provided (http://www.higis.org.cn:8080/hibo).


Introduction
A buffer is defined as the zone with a certain width around a geometric geographic feature, according to a specified buffer distance, and an overlay creates a composite map by combining the geometry and attributes of multiple data layers [1].Buffer and overlay analysis are two basic Geographic Information System (GIS) spatial operations for resource allocation, land planning, and many other relevant fields [2,3].In practical applications, the two operations are usually combined to solve spatial decision problems; typically, the site selection problem [4].In this paper, we use buffer-overlay analysis to represent the combined operation.
Buffer-overlay analysis is computationally intensive and time-consuming.Moreover, with the rapid development of surveying and mapping technology, the computational limitation becomes more prominent as greater volumes of large-scale spatial data is produced.
Buffer and overlay generation are the keys to buffer-overlay analysis.Several methods for solving generation problems have been proposed.These methods can be summarized into two categories, by the types of results they produce: Vector-based methods and raster-based methods.Vector-based methods use vector polygons to represent results, while raster-based methods use values of pixels in raster images to indicate result zones.Table 1 lists the advantages and disadvantages of the two types of methods.Due to their large storage space occupancy, raster-based methods are generally not applied to large-scale spatial data, and the related research mainly focuses on the calculation of raster buffers using a serial computing model [5,6].For vector-based methods, the edge constraint triangulation method [7] and the buffer equation approximation strategy are widely used [8] in buffer generation; in addition, in order to deal with large-scale data, several parallel strategies have been proposed to solve the vector-based buffer and overlay generation problems [9][10][11][12][13][14][15][16][17].However, the performance is still far from satisfactory as it is impossible to support real-time buffer-overlay analysis using the traditional methods, even when high-performance computing technologies are applied.For example, Shen [10] proposed a parallel vector buffer generation method, HPBM, based on Spark [18], and conducted an experiment on a high-performance cluster which compared HBPM to three optimized parallel methods and the popular GIS software programs (Table 2); as shown in the table, HBPM outperformed the other traditional data-oriented methods and is able to generate buffers for 597k linestring objects in around 3 min.As another example, Puri [16] presented a parallel GIS system, MPI-GIS, for polygon overlay processing of two GIS layers which employs R-tree for efficient indexing and identification of potentially intersecting sets of polygon objects; using MPI-GIS, the processing time of hundred-thousand-scale datasets is in the ten-second-level.

Advantages Disadvantages
Vector-based (i) Small storage space; (ii) no resolution loss while zooming in.
(i) High computational complexity; (ii) distortion occurs while zooming in (see Figure 1a: In vector polygons, circles or circular-arcs are simplified to regular-polygons or regular-polygon segments).
Raster-based Low computational complexity in overlay generation (i) Large storage space; (ii) sawtooth distortion while zooming in (see Figure 1b).Figure 2 presents the general buffer-overlay analysis flow using existing generation methods.In the flow, buffers of spatial objects are generated separately and merged to create buffer layers of datasets, then the buffer layers are combined to get the final overlay layer.It is data-oriented and straightforward, with computational scales expanding rapidly with the volume of spatial objects.The final overlay layer is provided to users on screens, though it can be extremely large and complex.
Based on this, we present a visualization-oriented parallel buffer-overlay analysis model, HiBO, to provide interactive and online buffer-overlay analysis of large-scale spatial data.In our previous work, we presented a visualization-oriented buffer analysis method named HiBuffer, which is capable of handling large-scale spatial point and linestring data in real time [21].Our previous experiments showed that HiBuffer has the striking performance of generating buffers for all of the datasets shown in Table 2 in less than 1 s.In addition, HiBuffer is able to provide interactive buffer analysis for much larger datasets.In this paper, we extend HiBuffer to support buffer analysis of polygon objects and overlay analysis as well, which also achieves remarkable effects.The buffer-overlay analysis flow in HiBO is shown in Figure 3, its core task is to determine the value of pixels for display.To the best of our knowledge, the approach is a brand new idea for buffer and overlay analysis, with the advantage of being insensitive to data volumes.Experimental results verify that HiBO is capable of handling ten-million-scale data in real time.The remainder of this paper proceeds as follows: Section 2 introduces the core ideas of the buffer and overlay generation methods in HiBO.In Section 3, the architecture of HiBO is described in detail.Section 4 provides an online demonstration, as well as an experiment to validate the performance of HiBO.The conclusions are drawn in Section 5.

Methodology
In this section, the core ideas of buffer and overlay generation methods in HiBO will be introduced.In HiBO, we utilize spatial indexes to determine whether a pixel is in the buffers of spatial objects, and accordingly, an efficient buffer generation method named Spatial-Index-Based Buffer Generation (SIBBG) is proposed.Specifically, compared with the SIBBG proposed in our previous work [21], we have extended SIBBG to support polygon objects.As for overlay analysis, HiBO is designed to support complex mixed set operations of multiple buffer analysis results, and we propose the Set-Transformation-Based Overlay Optimization (STBOO) method.

Spatial-Index-Based Buffer Generation
Spatial indexes are used to optimize spatial queries.As an efficient tree data structure widely used for indexing spatial data, R-tree is implemented by grouping nearby objects and representing them with their Minimum Bounding Rectangle (MBR) in the next higher level of the tree.The spatial queries using R-tree, including bounding-box query and nearest-neighbor search, has been fully optimized theoretically and practically by researchers [22].In SIBBG, we employ R-tree to determine whether a pixel is in the buffers of spatial objects.

SIBBG for Point and Linestring
In SIBBG, we utilize R-tree to organize point and linestring objects.Simply, point and segment are used as value types of nodes in R-tree (see Table 3).In R-tree, the intersect operation works well only for a bounding-box query (instead of queries using other polygon shapes), and the nearest-neighbor search has much higher computational complexity than the bounding-box query.Thus, we design SIBBG, as follows, for point and linestring objects (Algorithm 1).In SIBBG, we introduce inner and outer boxes (Figure 4) to optimize spatial queries.As a result of the optimizations, the performance of SIBBG is less sensitive to data volumes.Algorithm 1 SIBBG for point and linestring objects [21] Input: Pixel P, radius R, spatial index Rtree.Output: True or False (whether P is in the buffers of spatial objects with a given radius R).The buffer generation of polygon objects in HiBO has two issues: The buffer generation of polygon edges, and filling the areas inside polygon objects.As the polygon edges can be treated as linestring objects, we use the same solution adopted for linestring objects.For filling problems, we design a multi-level index architecture to accelerate judging whether a point is inside polygon objects.
As listed in Table 4, we build a two level R-tree for polygon objects.Each polygon edge is stored as a segment in RtreeB; additional node information includes the ID of a given polygon (PolygonID) and whether the edge is parallel to the x-axis (IsLevel).The polygon MBRs are stored as boxes in RtreeMBR.The pseudo-code of SIBBG for Polygon objects is given as follows (Algorithm 2).As shown in line 1-8, for polygon edges we use the same solution adopted for linestring objects.For the areas inside polygon objects, we first use the TmpMBR to find the candidate polygons (line 9) and then judge the spatial relationship between the pixel and each candidate polygon, one by one, until we find the polygon which contains the pixel (line [11][12][13][14][15][16][17][18][19][20][21][22].We apply the ray-casting algorithm [23] to determine whether a pixel is inside a polygon.To be more specific, given a pixel and a polygon, draw a segment (QuerySegment) from the MBR boundaries of the polygon to the pixel which is parallel to the x-axis, and then use the RtreeB to test how many times the segment intersects the edges of the polygon.The pixel is inside the polygon if the number of crossings is odd, or outside if it is even.The result holds for polygons with inner rings.Moreover, two optimizations (line 10 and line 14-17) have been made to minimize the length of the QuerySegment, as a longer QuerySegment may intersect large numbers of edges which belong to other polygons, and thus cause performance degradation.if EdgeCount is odd then return True 23: return False

Set-Transformation-Based Overlay Optimization
As shown in Figure 5, HiBO supports the four basic set operations on buffer analysis results: Intersection (∩), Union (∪), Difference (−) and Complement (∼).By mixing the operations, most overlay analysis problems can be covered.In STBOO, we reduce computational load by transforming the set operation expressions of overlay analysis.The process of STBOO is as follows.
Step 1 Simplify Expressions The total computation cost of overlay analysis in HiBO consists of the buffer generation cost and the set operation cost.As illustrated in Equation ( 1), Bu f f er(D i , R i ) represents the buffer generation of dataset D i with radius R i , Operate(O j ) is the process of set operation O j , and C represents the cost.
Table 5 presents the expression simplification of STBOO, which reduces the cost.Take the transformation in the Distributivity Law as an example: Where the expression is transformed from a, b, c are buffer analysis results; E represents universal Set; Φ represents empty Set.
Step 2 Reorder Parameters In HiBO, overlay analysis can be decomposed into tasks of determining the value of each pixel, according to the set operation expressions.The computation cost of a pixel P can be expressed as in Equation (2).SIBBG(D i , R i , P) represents the process of determining whether P is the buffer of D i with radius R i , and BoolCalc(O j ) is the process of Boolean operation O j .Compared with SIBBG, the cost of a Boolean operation is much less, and thus BoolCalc is omitted.Accordingly, the computation cost of a pixel P is roughly equal to the sum of SIBBG process costs.As the performance of SIBBG is less sensitive to data size, the optimization target in Step 2 is to reduce the number of SIBBG processes.
According to the Commutativity Law of set operations, for Intersection or Union, reordering parameters will not change the final results.Meanwhile, sometimes the operation results can be determined once the value of the first parameter is calculated, and it is unnecessary to know the value of the second parameter.Typically, if x is false the result of x ∩ y will be false, and if y is true the result of x ∪ y will be true.Thus, the number of SIBBG processes can be reduced by parameter reordering.
The parameter reordering rules of STBOO are listed in Table 6.It should be noted that parameter reordering is only used for the buffer-overlay analysis of point and linestring objects, as polygon objects involve the filling problem and it is hard to estimate the probability of a pixel in the buffers of polygon objects.The rules are simple, efficient, and highly effective.For the Intersection or Union operations of two overlay analysis expressions, we calculate the expression with fewer input dataset layers first (conditions i and ii).For the Intersection operation of two buffer analysis expressions, we calculate the expression with smaller buffer area first (condition iii), as an expression with smaller buffer zone area is more likely to be false.Simply, we assume that a buffer analysis expression with smaller size and radius simultaneously has smaller buffer zone area.And, for the Union operation of two buffer analysis expressions, we calculate the expression with larger buffer area first (condition iv).
Default: Do not reorder parameters.
D represents dataset of point or linestring objects.

Architecture
The architecture of HiBO is illustrated in Figure 6.It adopts the browser-server model.Specially, analysis results are organized into a tile-pyramid structure, provided as a Web Map Tile Service (WMTS).Tiles of different levels are selected for the screen display, according to zoom levels.When a user browses the analysis results, only tiles in the screen range need to be generated.Zooming in the results, tiles with higher levels and higher resolution will be used, and there is no sawtooth distortion.The server side of HiBO comprises three parts: Multi-Thread Tile Service (MTTS), In-Memory Messaging Framework (IMMF), and Hybrid-Parallel Analysis Engine (HPAE).

Multi-Thread Tile Service
MTTS encapsulates the buffer-overlay analysis service as a WMTS.We treat the analysis task in one tile range as an independent task.In the Check&Parse Requests process, MTTS analyzes the tile requests and filters out improper tasks, including (1) tasks with wrong operation expressions in the requests; and (2) tasks once processed with analysis results are still in the Result Pool.In the Render Tiles process, MTTS gets analysis results from the Result Pool and renders tiles according to the styles in the requests.To be more specific, the analysis results in the Result Pool are stored in the form of two-dimensional boolean arrays (true indicates zones in analysis results).In order to improve concurrency, multi-thread technology is adopted in the tile server.

In-Memory Messaging Framework
IMMF is a messaging framework based on Redis, which is an In-Memory Key-Value database.In this messaging framework, tasks and results are transferred rapidly in memory without disk I/O.The tasks are stored in a first-in-first-out (FIFO) queue in Redis.Tasks are pushed to the queue and popped to suspended MPI processes.To avoid errors in parallel processing, the push and pop operations are performed in blocking mode.After a task is finished, the analysis result will be written to Redis, and a task completion message will be sent to MTTS using the subscribe/publish functions in Redis.To avoid taking up too much memory, analysis results are set with expiry time, and expired data will be cleaned up once the max memory limit is exceeded.

Hybrid-Parallel Analysis Engine
HPAE adopts the hybrid MPI-OpenMP parallel processing model to achieve real-time buffer-overlay analysis.In HiBO, each task is processed with multiple OpenMP threads in one MPI process.As the task requests are generated by way of streaming, the tasks are dynamically allocated to the MPI processes.An MPI process will be suspended after the assigned task is accomplished, and new tasks will be handled on a first in, first served basis.The parallel strategy has the property of good load balancing.An example of the task process is shown in Figure 7.

Experimental Evaluation
We conduct an experiment on the SMP server to demonstrate the capability of handling ten-million-scale data in HiBO.HiBO is set to run with 32 MPI processes and 2 OpenMP threads in each process.Table 9 shows the datasets used in the experiment.We test the performance of different types of buffer and overlay analysis requests.For each type of request, we generate 5000 tasks, through a test program which randomly requests tiles from different zoom levels.In order to test the performance of analysis engine accurately, HiBO is set to run with no results preserved in the Result Pool.We analyze the tile rendering logs, and the experimental results are shown in Figure 10.As shown in Figure 10, the tile rendering time distributions of different request types are visualized with box plots ('•' represents outliers and '×' represents average rendering time).For buffer analysis, datasets of polygon objects produce poorer performance; this is because polygon objects involve the filling process.As illustrated by the figure, the computing time of overlay analysis with two buffer layers as inputs is less than the sum of the generation time of the two buffer layers.This is because of the optimization strategy used in STBOO.Of all request types, Bu f f er(A 1 , 50) produces the poorest performance, though most of the requested tiles are rendered in 0.4 s with the longest rendering time not exceeding 0.7 s.As the number of tiles in a screen range is generally no more than 50, we assume that a browser requests 50 tiles at once.Considering that there are 32 MPI processes, the 50 tasks will be processed in two rounds with 14 (=32 processes × 2 − 50 tasks ) MPI processes suspended in the second round-namely, it will be most likely completed in less than 0.8s (=0.4 s × 2).In conclusion, HiBO is able to provide interactive and online buffer-overlay analysis of ten-million-scale spatial data.

Conclusions and Future Work
This paper presents a parallel processing model, HiBO, for real-time buffer-overlay analysis when the data scale becomes extremely large.Differing from the traditional data-oriented methods, HiBO is visualization-oriented, with the core task transformed into determining the value of pixels for screen display.In HiBO, we employ R-tree to determine whether a pixel is in the buffers of spatial objects, and propose an efficient buffer generation method named SIBBG.HiBO supports complex mixed set operations of multiple buffer analysis results, and we present an effective overlay optimizaation method named STBOO.Parallel computing technologies are used to accelerate analysis in HiBO, and we propose a fully optimized hybrid-parallel processing architecture with good load balancing.Experiments on real-world datasets show that our approach is capable of handling ten-million-scale spatial data.In the future, we will apply our approach to solve the problem of rapid visualization for large-scale vector data.

Figure 1 .
Figure 1.Distortion of vector-based and raster-based results while zooming in.

Figure 4 .
Figure 4. Inner and outer boxes of pixel P with a given radius R [21].

Figure 5 .
Figure 5. Set operations of overlay analysis in HiBO.

Figure 7 .
Figure 7. Buffer-Overlay analysis of a tile range with 256 × 256 pixels in an MPI process with 4 OpenMP threads.

Figure 8 .
Figure 8. Input of the housing site selection in Spain.

Figure 9 .
Figure 9. Analysis result of the housing site selection in Spain.

Table 1 .
Advantages and disadvantages of traditional methods.

Table 2 .
Performance of traditional buffer generation methods.

Table 3 .
R-tree for point and linestring objects.

Table 9 .
Datasets of China.