A Map Tile Data Access Model Based on the Jump Consistent Hash Algorithm

Wang, Wei; Yao, Xiaojing; Chen, Jing

doi:10.3390/ijgi11120608

Open AccessArticle

A Map Tile Data Access Model Based on the Jump Consistent Hash Algorithm

by

Wei Wang

¹

,

Xiaojing Yao

^2,*

and

Jing Chen

¹

College of Information Technology Engineering, Tianjin University of Technology and Education, Tianjin 300391, China

²

The Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2022, 11(12), 608; https://doi.org/10.3390/ijgi11120608

Submission received: 11 October 2022 / Revised: 27 November 2022 / Accepted: 5 December 2022 / Published: 6 December 2022

(This article belongs to the Special Issue GIS Software and Engineering for Big Data)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Tiled maps are one of the key GIS technologies used in the development and construction of WebGIS in the era of big data; there is an urgent need for high-performance tile map services hosted on big data GIS platforms. To address the current inefficiency of massive tile map data management and access, this paper proposes a massive tile map data access model that utilizes the jump consistent hash algorithm. Via the uniformity and consistency of a certain seed of a pseudo-random function, the algorithm can generate a storage slot for each tile data efficiently. By recording the slot information in the head of a row key, a uniform distribution of the tiles on the physical cluster nodes is achieved. This effectively solves the problem of hotspotting caused by the monotonicity of tile row keys in the data access process, thereby maximizing the random-access performance of a big data platform and greatly improving concurrent database access. Experiments show that this model can significantly improve the efficiency of tile map data access by more than 39% compared to a direct storage method, thereby confirming the model’s advantages in accessing massive tile map data on a big data GIS platform.

Keywords:

hotspotting; jump consistent hash algorithm; map tiles; NoSQL database

1. Introduction

In the era of big data, where everything is connected, geographic information services have moved from map databases to real 3D reality and digital twins [1,2]. The superior advantages of big data GIS, such as high performance, high availability, scalability, and economy, provide a solid technical guarantee for future WebGIS development and construction of real 3D reality and digital twin systems [3,4,5,6]. One of the hot issues today in big data GIS is how to use a big data technology platform to efficiently store and process the massive, multi-source, and heterogeneous spatial tile map image data generated in the construction of smart cities [7,8]. Presently, the research addressing this issue mainly falls into two categories: row-key-based auxiliary index technologies and memory-based distributed computing framework technologies.

Research in the first stream is based on accessing data using the continuously ordered keys in the key-value data model of non-relational databases on big data platforms [9]. In this situation, the lexicographical order of the data keys in the key-value model is used to store key-value tuples on cluster nodes in a distributed manner. All operations upon the data are implemented through “row keys” [10]. For query operations within a specified geospatial range, if most geographical features with similar spatial locations are concentrated in a narrow range of row key sequences, it is possible to reduce the scanning time for query operations on the row key sequences, thereby improving the efficiency of query retrieval operations. To this end, some scholars use space-filling curves to build auxiliary indexes based on row keys. For example, Hajjaji et al. [11], Shen et al. [12], Zheng et al. [13], and Yu et al. [14] proposed mapping the longitude and latitude coordinates in two-dimensional space into a one-dimensional row key index based on Hilbert curve rules. When querying tile data within the specified spatial range, the longitude and latitude coordinates are first used to calculate the row key range of tile data stored in the key-value database, and then all of the tiles stored in the specified spatial range on the cluster are retrieved through a scanning operation [15,16]. However, although a Hilbert curve can improve the efficiency of some spatial query operations to some extent, a Hilbert curve cannot group the tiles together for any spatial range that might be specified. The query operation often needs to scan a large range containing the key values, and in extreme cases, it may even need to scan the entire node cluster.

The second stream of research, the distributed memory computing framework, exploits the fact that memory access speed is much faster than the speed of external storage devices, and the cost of memory has dropped significantly in recent years [17,18,19]. Based on the characteristics of spatial query and the Spark distributed memory computing model, Fang et al. [20] designed the spatial region query algorithm for distributed storage, the distributed spatial index, the distributed memory computing framework, and the Spark Streaming spatial query algorithm to provide real-time online spatial query service. Cui et al. [21] and Wang et al. [22] proposed distributed storage and comprehensive processing strategies for spatial big data based on a memory database and a non-relational database to solve the complex problems that arise in big data storage and management. Although such technology significantly improves the reading speed for tile data, it consumes a large amount of the host’s memory resources, requires a high-performance cluster service node, and has several other problems, such as difficult implementation and high operation and maintenance costs.

In summary, existing studies all begin with the spatial characteristics of tile data and try to use a single cluster scanning operation to obtain all tiles in the specified spatial range. When visualizing tile map data based on WebGIS technology, the front-end display components concurrently generate HTTP requests for each tile and load the tiles retrieved from the big data GIS platform. Therefore, regarding the demand for basic base map services for the development of WebGIS-based true 3D real-world and digital twin systems, the key issue is the big data GIS platform’s response speed for a single tile data request under concurrent conditions, rather than the time efficiency of a single batch retrieval of tile data delineated by spatial scope. This paper’s research object is the key-value non-relational database of the big data platform. We analyze the core factor, hotspotting, that determines the efficiency of accessing tile data on a big data platform based on the big data platform’s architecture and the data access principle, and we use the analysis to create a tile data access model with absolute load balancing characteristics for the non-relational database architectures of big data platforms. The model improves concurrent access performance for massive tile map data and provides high-performance, scalable, and highly available basic tile map services for the construction of true 3D real-world and digital twin systems.

2. The Hotspotting Issue

The adoption of a big data storage platform with good horizontal scalability to store structured, semi-structured, and unstructured massive data is an inevitable choice for effectively coping with the explosive growth of all kinds of data in the era of big data. The excellent horizontal scalability of big data platforms is generally achieved using a non-relational database (NoSQL) architecture. At present, non-relational databases are mainly classified as document databases, wide-column databases, graph databases, and key-value databases. Among these, the key-value database is a perfect application choice for storing large amounts of data without supporting complex conditional queries. Since the storage of tile data generated by the tile pyramid model is uniquely determined on a disk, the storage path can be used as the key value for the tiles, allowing a key-value database to be used to realize the storage of massive tile data.

Big data platforms with horizontal scalable performance generally adopt a master–slave architecture in which the hosts within a cluster are divided into two categories: master nodes and slave (data) nodes. Master nodes receive data requests and forward the requests to data nodes that will store the key-value tuples. In the process of requesting tile data from a cluster through the WebGIS front end, as shown in Figure 1, the front-end WebGIS software first generates a unique corresponding key value, which is composed of level z, tile column index x, and tile row index y defined by the WMTS standard called Tile Matrix Set [23], for each tile within the geographic view, and then concurrently sends tile data retrieval requests corresponding to the key value to the big data platform. When the cluster master node receives the tile retrieval request, it forwards the request to a slave node that stores the tile based on the key value, which completes the tile data retrieval.

It is easy to see that the key to the cluster’s efficiency in responding to a large number of concurrent requests is a uniform distribution of the key tuples across the data nodes (load balancing). Since the keys constituting the tile data are based on the Tile Matrix Set defined in the Web Tile Map Service standard, which consists of the map scaling level (z), the column index (x), and the row index (y) of the raster in which the tiles are located, the arrangement of the keys has an obvious monotonic increasing character [24,25,26]. At the time of tile data entry, since the cluster does not know the distribution range of the key values, the cluster will store all of the data into one block of a data node, and a split will automatically be triggered only when the size of the block exceeds a certain threshold value (e.g., 10G for HBase). Similarly, during data query retrieval, as the data is stored in one block, all GET requests must be sent to the same node in the cluster. This uneven distribution of data across the target storage nodes due to row key monotonicity is the decisive factor in creating hotspots when accessing tiled data in non-relational databases, and it leads to a sharp performance degradation [27,28].

Thus, to achieve efficient access to tile data, it is necessary to create a data model that enables uniform distribution of the data stored on server nodes to prevent the formation of access hotspots and then take full advantage of the random read performance of key-value non-relational databases.

3. Data Access Model Based on the Jump Consistency Hashing Method

Assuming there are n slots for storing key-value tuple data on a cluster, an effective strategy to achieve cluster load balancing and avoid the hotspotting problem is to distribute the key-value data evenly among these slots. If the number of these slots is identified in the header of the row key, then, combined with the pre-partitioning technology of the key-value non-relational database, the key-value data can be stored in exactly the corresponding slots. That is, using the identification information for the tile data and assigning an appropriate slot to each tile is the key to achieving efficient access to tile data on the cluster. To this end, we offer the following proposition:

For any given map view, there exists a function f such that,

N = f (k e y, n),

(1)

holds for any tile in the map view, where key is a combination of characters consisting of the tile’s level (z), its column index (x), and its row index (y), n is the total number of pre-established slots, and N is the target slot number where the key value should be stored. In addition, the function f satisfies consistency and equal probability, where consistency means that there is always a unique N that corresponds to any given key and n, while equal probability means that the value of N is uniformly (with equal probability) distributed on the target slot interval [0, n − 1].

Given the Tile Matrix Set Model used in the process of tile map data generation and retrieval, we propose the tile data access model shown in Figure 2, which has the tile access slot calculation function f as its core.

At the time of data entry, from Figure 2, we can see there are several key steps to implement tile data storage. Firstly, using level (z), column index (x), and row index (y) defined in the Tile Matrix Set Model, the key which can identify tile data uniquely can be generated through a c function. Composing the tile data itself, the original key-value tuple, which is noted as Tuple(key, value) in Figure 2, is formed. Then, taking key as the first parameter and n as the second parameter of function f, the target slot where the tile should be saved is pre-calculated by function f. Next, the slot number is converted to ASCII code through the mapping function map. The ASCII code is then used as the header of the row key together with the tile index information (key) to form the row key (rowkey, represented in green in Figure 2), and the key in the original tuple is updated by rowkey. Finally, the PUT method of the big data platform’s date storage interface is called upon to save the Tuple(rowkey, value) data.

At the time of data retrieval, it is similar to data entry. WebGIS software, such as Openlayers, can generate a unique identification marked as level (z), column index (x), and row index (y) defined in the Tile Matrix Set Model for each tile in a map view. Using the same function c, level (z), column index (x), and row index (y) can be composed as a key(represented in brown in Figure 2). Taking key as the first parameter and n as the second parameter, the target slot number N can be restored. Using the map function, the target slot, which will be the rowkey header, is obtained. At last, the rowkey is restored, and the GET method of the big data platform’s data retrieval interface is called upon to retrieve the tile data.

3.1. Target Slot Calculation

The data that is generated using the tile map pyramid model is uniquely identified by the level (z), horizontal coordinate (x), and vertical coordinate (y) of the raster where the tile is located. In this paper, we use function c(z,x,y) to obtain the tile’s unique identification key. Then, using the key and the total number of slots in the target storage environment as parameters, we call upon function f to calculate the target slot number where the key value should be stored. Assuming that the cluster is predefined with n target storage slots, based on the cluster size, the target slot number N calculated by the function f is required to satisfy the following equation

P_{N} (k e y) = \frac{1}{n} .

(2)

That is, for any key, the distribution probability that N is in the target slot interval [0, n − 1] is 1/n.

Function f is implemented using the jump consistent hash algorithm proposed by Lamping and Veach [29]. The core idea of this algorithm is illustrated by Figure 3.

For an arbitrary key value, a pseudo-random function is first used to generate a sequence of random numbers corresponding to the number of the cluster’s data nodes. The pseudo-random function ensures that the generated sequence of pseudo-random numbers is uniformly (with equal probability) distributed over the interval [0, 1]. In the process of changing the total number of slots from 1 to n (n = 1 to n = 5 are shown in Figure 3), whenever the node serial number changes from j to j + 1, there will be a total of 1/(j + 1) keys that must migrate from j slots to j + 1 slots. Since the probability that the key is located in slot j + 1 is 1/(j + 1), a key corresponding to a random series value less than 1/(j + 1) is specified to jump to slot j + 1.

The jump consistent hash Algorithm 1 is described by the following code:

Algorithm 1: Jump Consistent Hash Algorithm

1    int ch(int k, int n) {
    2    random.seed(k); // Initialize the pseudo-random function
    3    int b = 0;
    4    for (int j = 1; j < n; j++) {
    5    if (random.next() < 1.0/(j + 1))
    6        b = j + 1; //A tuple with a pseudo-random number less than 1/(j + 1), then jumps to slot j + 1
    7    }
    8    return b;
    9    }

For n nodes, the algorithm requires n comparisons to calculate the sequence of nodes corresponding to the key, and the time complexity is O(n). Since the probability that random.next() < 1/(j + 1) is relatively small, the hit rate of the decision condition in the algorithm is not high. Then, since the probability of deciding that a node needs to migrate is a low-probability event, the probability of staying in the source node without moving is a high-probability event. Based on this idea, the algorithm can be further optimized, which in turn decreases the time complexity to O(ln(n)). The optimized Algorithm 2 is described by the following code:

Algorithm 2: Optimized Jump Consistent Hash Algorithm

1    int ch(int k, int n) {
    2    random.seed(k); // Initialize the pseudo-random function
    3    int b = −1, j = 0;
    4    while (j < n) {
    5        b = j;
    6        r = random.next();
             // Skip the number of slots where the pseudo-random number is greater than 1/(j + 1)
    7        j = floor((b + 1)/r);
    8    }
    9    return b;
    10    }

The jump consistent hash algorithm has the advantages of requiring no memory and a high speed. However, the algorithm requires the target slot numbers to begin with 0 and be consecutive, which means that slots must be added and deleted at the end. In order to overcome the shortcomings of the algorithm in terms of scalability, further optimization of the value N produced by Equation (1) is required using the following equation:

s l o t = m a p (N),

(3)

where slot is the slot number of the final tag in the row key header, and map is the mapping function. The mapping of the slot number to the final slot mark in the slot pool is achieved by a custom map function.

3.2. Tile Data Entry

In order to clearly illustrate the application of the model to a specific database, the Apache HBase database, a distributed and scalable big data storage platform, is introduced as a research object for tile data access. HBase manages huge datasets as key-value tuples and uses key-value mapping to achieve highly consistent, real-time access to big data. Because HBase has many advantages, such as being open source and maturely developed as well as having broad application in various industries, HBase is widely used in the field of big data platforms.

3.2.1. Table Structure Design

The HBase database is an open-source implementation of Google’s paper “Bigtable: A Distributed Storage System for Structured Data” [30]. The heart of an HBase table is a HashMap relying on the simple concept of storing and retrieving a key-value pair. Therefore, every individual data value is indexed on a key in the HBase universe. Further, the HashMap is always stored as a sorted map, the sorting being based on the lexicographical ordering of the various labels that are present in a key [31,32].

Key to the design of HBase table keys is the design of the row key [33,34]. Since the tile data mainly include parts of the tile image itself, indexes, and tile metadata information, the HBase table for saved tiles adopts the principle of high table design (only one column family is defined in the big table) [31] and thus compresses the space occupied by tile storage to the maximum extent.

The number of bytes for saving slot information in the row key is based on the size of the data nodes. In this paper, one byte is used to identify the slot in which the tile data is stored. A second byte in the row key is used to save the metadata index for the tile data. The tile metadata includes the projection coordinates, control points, categories, producers, and other information regarding the tile data to support the storage of different batches and categories of remote sensing data. The following eight bytes of the row key, a long integer, are used to identify the tile data’s storage path, including the scaling level z and the x- and y-coordinates of the tile pyramid model. Single characters are used to identify the column family and column name. The structure of the row key can be depicted by Figure 4.

3.2.2. Storage Area Pre-Segmentation

Since the “slot” token occupies only one byte in the row key, the slot token takes a value in the range [0, 255]. HBase master nodes assign the data triplet (key, field, value) to a pre-specified target region partition based on the slot information in the row key header. The scalable storage architecture for tile data can be illustrated by Figure 5. There are four layers in this architecture. The first layer is the slot number composed of consecutive integers. The second layer is a mapping of first layer, which is composed of discrete characters. According to the order of ASCII codes, multiple characters are grouped to a region, which is an element of third layer called regions. To the last layer, it is physical servers which composed by several regions in the third layer.

Because the row key cannot contain the asterisk character (*), ASCII sequences without this character are used to map the slot numbers to ASCII codes. To obtain the correct assignment of row keys marked with slot information to the target region, it is also necessary to use pre-partitioning techniques when creating HBase tables. The pre-partitioning script run under the HBase SHELL is as follows:

$>create ‘Tiles’, {NAME=>‘f’,VERSIONS=>1},SPLITS=>[‘4’,‘=,‘F’,‘O’,‘X’,‘a’,‘j’,‘s’].

This script creates large table tiles on the HBase cluster and pre-partitions these large tables into nine regions. The number of pre-partitions is determined based on the actual number of physical nodes in the cluster. In accordance with the load balancing principle, HBase distributes these regions evenly among the physical servers running the region server application.

3.3. Tile Data Access

Since the pseudo-random function random() is consistent, that is, the resulting random series is consistent for a determined value (seed), to read tile data for a certain map zoom level with coordinates z, x, y, it is possible to restore the slot information from the time of data entry by using the index key z, x, y as the seed for the random function. This will generate the row key for the tile that has been saved in the large table. Each piece of tile data in the map view can then be quickly read by concurrent GET requests.

4. Performance Analysis and Application Practice

4.1. Performance Analysis

To verify the actual performance of the tile data storage model described in this paper, we installed an HBase database to run on top of a widely available Hadoop cluster with five nodes and then conducted a comprehensive test to analyze the uniformity of the data distribution on the server and the efficiency of the data access.

4.1.1. Physical Hardware and Application Processes

The physical servers in the cluster are divided into two categories, NameNode servers and DataNode servers. Servers running NameNode are allocated more memory and fewer storage resources, while servers running DataNode are allocated more storage and fewer memory resources. The specific hardware resource allocations are shown in Table 1.

4.1.2. Analysis of Data Distribution Uniformity

To scientifically and comprehensively analyze the data distribution uniformity, we analyzed the uniformity of the data distribution along two dimensions: data storage on the servers and tile data sources during map browsing.

Uniformity of data storage distribution

Using the 1 m resolution of the remote-sensing image data slice for the entire area of Tianjin as the experimental data source, 890,760 tiles were stored in the HBase database using this paper’s proposed storage model based on the jump consistency hash function. The distribution of the tiles clustered on the region servers is shown in Table 2.

From Table 2 we can see that there are three physical data nodes named d1, d2, and d3 in the cluster. All of these data nodes have the same role of Region Server, and each region server has three regions. The key-value tuples sequence arranged in lexicographical order is evenly divided into nine parts by eight row-key-start letters: ‘4’, ‘=’, ‘F’, ‘O’, ‘X’, ‘a’, ‘j’, and ‘s’. The starter letter in the row key is the slot which is computed by the model in this paper.

Using the regions assigned on the region server as the horizontal axis and the number of tiles stored in the region and the space occupied as the vertical axis, we plot a histogram of the data on the region server in Figure 6:

The above analysis shows that the data storage model based on the jump consistency hashing algorithm achieves uniform storage of the tile data on the cluster data nodes and does not create a hotspot in the storage.

Map view tile data sources

Another indicator to measure the uniformity of data storage is that when multiple users perform map browsing operations, the tiles that constitute a certain map view for each map browsing operation are evenly sourced from multiple data node servers. In this paper, we use the Openlayers component to read the tile data that have been stored in HBase using the jump consistency hashing algorithm. For each map view operation in the front end, the source of the tile data in the view is recorded, and thus the uniformity of data acquisition on the node servers is measured by the data retrieval application. Table 3 records the tile data retrieved from the data nodes (d1, d2, and d3) after 10 views (zoom in, zoom out, and pan) of the map.

Measuring the number of map browsing operations along the horizontal axis and the number of tiles loaded from the region server per operation along the vertical axis, we plot the tile data read load balancing curve in Figure 7:

Figure 7 describes the source of tile data constituting the map view from different servers in a map browsing operation such as panning, zooming in, and zooming out. The horizontal axis represents the order number of map operations which update the map view, while the vertical axis represents the number of tiles returned from a physical server in one-time map operation. From Figure 7, we can obtain Openlayer loads tile data evenly from different data node servers for each map browsing operation, which effectively achieves load balancing during data publishing applications and avoids node overheating during data access.

4.1.3. Analysis of Data Retrieval Efficiency

To check the efficiency of the data access model proposed in this paper, the Tianjin city-wide 1 m resolution remote sensing images, downloaded from the Local Construction Bureau, were deposited in the HBase database using two technical schemes. One scheme uses the data storage model described in this paper, and the other uses the tile storage path as the row key for direct entry. Conducting three refresh operations for 20 tiles of the same map view under the two storage schemes, the read time for each tile per operation was recorded. The efficiency gain between the two technical schemes is calculated using the following equation:

η = \frac{\bar{t_{1}} - \bar{t_{2}}}{\bar{t_{1}}} \cdot 100 [%],

(4)

where

\bar{t_{1}}

represents the average time for each tile data acquisition under the direct storage scheme with three refresh operations,

\bar{t_{2}}

represents the average time for each tile data acquisition using the data model scheme proposed in this paper with three refresh operations, and η is the surface efficiency improvement. The calculation shows that the efficiency improvement is 39% when the data storage model proposed in this paper is used.

Using the horizontal axis for the tile number and the vertical axis for the retrieval time for the tiles, we plotted the time-spent curves for the two technical schemes in Figure 8. From the statistical curves, it can intuitively be seen that, compared to the HBase direct storage, the use of the jump consistent hash algorithm to implement the tile data storage and retrieval application functions has the outstanding advantages of fast service response and stable and efficient performance.

4.2. Application in Practice

We directly applied the technical solution of the tile data storage model based on the jump consistent hashing algorithm proposed in this paper to the development and construction of the “Tianjin Eco-city GIS Service Platform”. A high-availability Hadoop cluster environment was deployed in a private cloud computing environment, and an HBase database was installed on top of it. We used the tile data storage model developed with the jump consistent hashing algorithm to store the administrative map, the high-definition image map, and some tilt photography and street view data of the Eco-city for each month since 2010 in the HBase database. Using HBase’s REST Server API, the storage model achieved direct and fast rendering of the tile data in the front end, with the result shown in Figure 9.

5. Conclusions

To address the problem of efficiently storing and publishing massive tile data on a big data storage platform, this paper proposed a tile data access model that used the jump consistent hashing algorithm at its core. By constructing a storage environment with an absolutely uniform distribution of tile data in the cluster storage nodes, the model achieved load balancing on the cluster under concurrent access conditions, successfully solved the data hotspot problem when massive tile data was concurrently stored and retrieved, and significantly improved the tile data retrieval efficiency under concurrent access conditions. Since the total number of slots, n, needs to be determined when calculating the slots using the jump consistent hashing algorithm, the data size must first be predicted when a specific access scheme is implemented using this model. In practical applications, the value of n can be set as large as possible, taking the data size and efficiency into account, in order to meet the demand for horizontal expansion of clusters in future data growth explosion environments. In future research, we plan to address the implementation of hashing algorithms that support variable values of n.

The model described in this paper can be applied not only directly to tile data access but also to all data access processes that have monotonic key values (e.g., time-series data). The model addresses the deficiencies of key-value model databases in handling multi-conditional combinatorial queries with the efficiency of clustering to handle concurrent requests and thus provides a new technical idea for other kinds of tile data access.

Author Contributions

Conceptualization Wei Wang; methodology, Wei Wang; software, Wei Wang; validation, Wei Wang; formal analysis, Xiaojing Yao.; investigation, Xiaojing Yao; resources, Wei Wang; data curation, Jing Chen; writing—original draft preparation, Wei Wang; writing—review and editing, Jing Chen; supervision, Wei Wang; project administration, Wei Wang. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Tianjin Natural Science Foundation under Grant No. 18JCYBJC84900.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tao, F.; Xiao, B.; Qi, Q.; Cheng, J.F.; Ji, P. Digital twin modeling. J. Manuf. Syst. 2022, 64, 372–389. [Google Scholar] [CrossRef]
Li, D.R.; Xu, X.D.; Shao, Z.F. On Geospatial Information Science in the Era of IoE. Acta Geod. Cartogr. Sin. 2022, 51, 1–8. [Google Scholar] [CrossRef]
Jones, D.; Snider, C.; Nassehi, A.; Yon, J.; Hicks, B. Characterising the Digital Twin: A systematic literature review. CIRP J. Manuf. Sci. Technol. 2020, 29, 36–52. [Google Scholar] [CrossRef]
Song, G.F.; Chen, Y.; Luo, Q.; Wu, M. Development and Prospect of GIS Platform Software Technology System. J. Geo-Inf. Sci. 2021, 23, 2–15. [Google Scholar] [CrossRef]
Li, Q.Q.; Li, D.R. Big Data GIS. Geomat. Inf. Sci. Wuhan Univ. 2014, 39, 641–644. [Google Scholar] [CrossRef]
Pei, T.; Huang, Q.; Wang, X.; Chen, X.; Liu, Y.X.; Song, C.; Chen, J.; Zhou, C.H. Big Geodata Aggregation: Connotation, Classification, and Framework. Natl. Remote Sens. Bull. 2021, 25, 2153–2162. [Google Scholar] [CrossRef]
Kim, H.; Choi, H.; Kang, H.; An, J.; Yeom, S.; Hong, T. A systematic review of the smart energy conservation system: From smart homes to sustainable smart cities. Renew. Sustain. Energy Rev. 2021, 140, 110755. [Google Scholar] [CrossRef]
Shi, J.Y.; Li, P. Key Technologies and Application Exploration of Aerospace Big Data in the Construction of New Smart City. Big Data Res. 2022, 8, 120–133. [Google Scholar] [CrossRef]
Ramzan, S.; Bajwa, I.S.; Kazmi, R. Challenges in NoSQL-Based Distributed Data Storage: A Systematic Literature Review. Electronics 2019, 8, 488. [Google Scholar] [CrossRef] [Green Version]
Van, L.H.; Atsuhiro, T. G-HBase: A High Performance Geographical Database Based on HBase. Ieice Trans. Inf. Syst. 2018, E101.D, 1053–1065. [Google Scholar] [CrossRef]
Hajjaji, Y.; Boulila, W.; Farah, I.R. An improved tile-based scalable distributed management model of massive high-resolution satellite images. Procedia Comput. Sci. 2021, 192, 2931–2942. [Google Scholar] [CrossRef]
Shen, B.; Liao, Y.C.; Liu, D.; Chao, H.C. A method of hbase multi-conditional query for ubiquitous sensing applications. Sensors 2018, 18, 3064. [Google Scholar] [CrossRef] [Green Version]
Zheng, K.; Zheng, K.; Fang, F.; Zhang, M.; Li, Q.; Wang, Y.; Zhao, W. An extra spatial hierarchical schema in key-value store. Clust. Comput. 2019, 22, 6483–6497. [Google Scholar] [CrossRef]
Yu, K.; Xiong, X.R.; Gao, T. Design and Implementation of Cloud Storage System for Map Tiles Based on Hadoop. J. Geomat. 2017, 42, 74–77. [Google Scholar] [CrossRef]
Wang, X.; Sun, Y.; Sun, Q.; Lin, W.W.; Wang, J.Z.; Li, W. HCIndex: A Hilbert-Curve-based clustering index for efficient multi-dimensional queries for cloud storage systems. Clust. Comput. 2022, 1–15. [Google Scholar] [CrossRef]
Wu, Y.; Cao, X.; Sun, W. MI-HCS: Monotonically increasing Hilbert code segments for 3D geospatial query window. IEEE Access 2020, 8, 47580–47595. [Google Scholar] [CrossRef]
He, Z.; Liu, G.; Ma, X.; Chen, Q. GeoBeam: A distributed computing framework for spatial data. Comput. Geosci. 2019, 131, 15–22. [Google Scholar] [CrossRef]
Tang, M.; Yu, Y.; Mahmood, A.R.; Malluhi, Q.M.; Ouzzani, M.; Aref, W.G. LocationSpark: In-memory Distributed Spatial Query Processing and Optimization. Front. Big Data 2020, 3, 30. [Google Scholar] [CrossRef] [PubMed]
Baig, F.; Vo, H.; Kurc, T.; Saltz, J.; Wang, F. SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, 7–10 November 2017; pp. 1–10. [Google Scholar] [CrossRef]
Fang, J.Y.; Liu, Y.; Yao, X.; Chen, C.T.; Zhang, M.F.; Xiao, Z.J.; Zhang, G.F. Research on Spark-based Real-time Query of Spatial Data. Geomat. World 2015, 6, 24–31. [Google Scholar] [CrossRef]
Cui, C.; Zhen, L.H.; Han, F.P.; He, M.J. Design of Secondary Indexes in HBase Based on Memory. J. Comput. Appl. 2018, 38, 1584. [Google Scholar] [CrossRef]
Wang, N.S.; Wang, W.J.; Zhang, Z. Research and Implementation of the Temporal Map Tile Data Storage Model Based on NoSQL Database. Geomat. Spat. Inf. Technol. 2020, 43, 132–134. [Google Scholar] [CrossRef]
Web Map Tile Service Implementation Standard. Available online: https://www.ogc.org/standards/wmts (accessed on 18 November 2022).
Huo, L.; Yang, Y.D.; Liu, X.Y.; Qiao, W.H.; Zhu, W.Z. Research and Practice of Tiles Pyramid Model Technology. Sci. Surv. Mapp. 2012, 37, 144–146. [Google Scholar] [CrossRef]
Xuming, S.U.; Tan, J. The Research of Key Technologies for The Tile Map in WebGIS. Beijing Surv. Mapp. 2012, 2, 9–12. [Google Scholar] [CrossRef]
Ying, X.; Yang, X. Remote Sensing Image Data Storage and Search Method Based on Pyramid Model in Cloud. In International Conference on Rough Sets & Knowledge Technology; Springer: Berlin/Heidelberg, Germany, 2012; pp. 267–275. [Google Scholar]
Pan-Yu, L.I.; Jia, H. Spatio-temporal Block Index for Traffic Data Based on HBase. Inf. Technol. 2019, 12, 116–120. [Google Scholar] [CrossRef]
Li, S.J.; Yang, H.J.; Huang, Y.H.; Zhao, Q. Geo-spatial Big Data Storage Based on NoSQL Database. Geomat. Inf. Sci. Wuhan Univ. 2017, 42, 163–169. [Google Scholar]
Lamping, J.; Veach, E. A Fast, Minimal Memory, Consistent Hash Algorithm. arXiv 2014, arXiv:1406.2294. Available online: https://arxiv.org/ftp/arxiv/papers/1406/1406.2294.pdf (accessed on 22 November 2022).
Chang, F.; Dean, J.; Ghemawat, S.; Hsieh, W.C.; Wallach, D.A.; Burrows, M.; Chandra, T.; Fikes, A.; Gruber, R.E. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 2008, 26, 1–26. [Google Scholar] [CrossRef]
Apache HBase Reference Guide. Available online: https://hbase.apache.org/book.html#rowkey.design (accessed on 18 November 2022).
Design Principles for HBase Key and Rowkey. Available online: https://ajaygupta-spark.medium.com/design-principles-for-hbase-key-and-rowkey-3016a77fc52d (accessed on 22 November 2022).
Huang, J.; Zhao, J.; Guo, Y.; Mao, X.; Wang, J. The Application on Distributed Geospatial Data Management Based on Hadoop and the Application in WebGIS. In Proceedings of the 2021 9th International Conference on Agro-Geoinformatics (Agro-Geoinformatics), Shenzhen, China, 26–29 July 2021; pp. 1–4. [Google Scholar]
Li, X.J.; Zhao, J.J.; Nie, H.M.; Wang, Y. The Design and Verification of Row Key in HBase Database. Softw. Guide 2019, 18, 178–181. [Google Scholar] [CrossRef]

Figure 1. Formation of cluster hotspotting [23].

Figure 2. Access model for tile data in HBase.

Figure 3. Jump consistent hash algorithm.

Figure 4. Structure of the row key Storing Tiles.

Figure 5. Scalable storage architecture for tile data.

Figure 6. Data distribution in the region server.

Figure 7. Load balance in tile data reading.

Figure 8. Comparative analysis of server performance using two different storage schemes.

Figure 9. Application of the proposed model in a GIS project.

Table 1. Experimental hardware resource allocation.

Server	Memory	Hard Disk	Hadoop (High-Availability) Process	HBase Process
n1	8G	200G	Namenode: DFSZKFailoverController	HMaster
n2	8G	200G	Namenode: DFSZKFailoverController	HMaster
d1	4G	500G	JournalNode: QuorumPeerMain	Datanode, HRegionServer
d2	4G	500G	JournalNode: QuorumPeerMain	Datanode, HRegionServer
d3	4G	500G	JournalNode: QuorumPeerMain	Datanode, HRegionServer

Table 2. Distribution of tile data on region servers.

Region Server	Row Key Starter	Row Key Terminator	Number of Tiles Stored (pcs)	Space Used (G)
d2:16030		4	99,468	1.17
d3:16030	4	=	99,174	1.17
d3:16030	=	F	98,472	1.16
d2:16030	F	O	98,984	1.17
d1:16030	O	X	99,548	1.17
d2:16030	X	a	98,833	1.15
d3:16030	a	j	98,666	1.15
d1:16030	j	s	98,513	1.16
d1:16030	s		99,102	1.17

Table 3. Tile data source for the same map view.

Operation Num	Region Server: d1				Region Server: d2				Region Server: d3
Operation Num	Reg:1	Reg:2	Reg:3	Total	Reg:1	Reg:2	Reg:3	Total	Reg:1	Reg:2	Reg: 3	Total
1	3	4	5	12	4	4	5	13	4	7	4	15
2	7	2	5	14	6	3	7	16	5	2	3	10
3	6	4	3	13	6	3	3	12	2	3	2	7
4	1	2	6	9	2	5	6	13	3	2	4	9
5	4	3	5	12	8	4	6	18	5	1	3	9
6	9	3	4	16	3	2	6	11	9	5	2	16
7	2	4	5	11	3	4	5	12	5	2	6	13
8	10	4	5	19	1	2	5	8	3	4	4	11
9	7	3	5	15	5	2	6	13	9	2	2	13
10	3	3	6	12	3	4	3	10	2	6	4	12

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, W.; Yao, X.; Chen, J. A Map Tile Data Access Model Based on the Jump Consistent Hash Algorithm. ISPRS Int. J. Geo-Inf. 2022, 11, 608. https://doi.org/10.3390/ijgi11120608

AMA Style

Wang W, Yao X, Chen J. A Map Tile Data Access Model Based on the Jump Consistent Hash Algorithm. ISPRS International Journal of Geo-Information. 2022; 11(12):608. https://doi.org/10.3390/ijgi11120608

Chicago/Turabian Style

Wang, Wei, Xiaojing Yao, and Jing Chen. 2022. "A Map Tile Data Access Model Based on the Jump Consistent Hash Algorithm" ISPRS International Journal of Geo-Information 11, no. 12: 608. https://doi.org/10.3390/ijgi11120608

APA Style

Wang, W., Yao, X., & Chen, J. (2022). A Map Tile Data Access Model Based on the Jump Consistent Hash Algorithm. ISPRS International Journal of Geo-Information, 11(12), 608. https://doi.org/10.3390/ijgi11120608

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Map Tile Data Access Model Based on the Jump Consistent Hash Algorithm

Abstract

1. Introduction

2. The Hotspotting Issue

3. Data Access Model Based on the Jump Consistency Hashing Method

3.1. Target Slot Calculation

3.2. Tile Data Entry

3.2.1. Table Structure Design

3.2.2. Storage Area Pre-Segmentation

3.3. Tile Data Access

4. Performance Analysis and Application Practice

4.1. Performance Analysis

4.1.1. Physical Hardware and Application Processes

4.1.2. Analysis of Data Distribution Uniformity

4.1.3. Analysis of Data Retrieval Efficiency

4.2. Application in Practice

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI