State-of-the-Art Geospatial Information Processing in NoSQL Databases

: Geospatial information has been indispensable for many application ﬁelds, including tra ﬃ c planning, urban planning, and energy management. Geospatial data are mainly stored in relational databases that have been developed over several decades, and most geographic information applications are desktop applications. With the arrival of big data, geospatial information applications are also being modiﬁed into, e.g., mobile platforms and Geospatial Web Services, which require changeable data schemas, faster query response times, and more ﬂexible scalability than traditional spatial relational databases currently have. To respond to these new requirements, NoSQL (Not only SQL) databases are now being adopted for geospatial data storage, management, and queries. This paper reviews state-of-the-art geospatial data processing in the 10 most popular NoSQL databases. We summarize the supported geometry objects, main geometry functions, spatial indexes, query languages, and data formats of these 10 NoSQL databases. Moreover, the pros and cons of these NoSQL databases are analyzed in terms of geospatial data processing. A literature review and analysis showed that current document databases may be more suitable for massive geospatial data processing than are other NoSQL databases due to their comprehensive support for geometry objects and data formats and their performance, geospatial functions, index methods, and academic development. However, depending on the application scenarios, graph databases, key-value, and wide column databases have their own advantages.


Introduction
The amount of personal location data is forecast to increase by 20% every year, and location-aware information occupies a large proportion of the data generated every day: 2.5 quintillion bytes [1,2]. The advent of the geospatial big data era requires new applications and creates new challenges [3,4]. How to store, manage, and query geospatial data effectively have become the focus of research and are problems that must be solved [3,[5][6][7]. At present, the main geospatial databases are divided into two types: relational databases and NoSQL databases. Relational databases are the most widely used and the most mature database systems, and they have been applied in various industries for decades. To enrich geospatial functions and geospatial processing capability, some modern relational databases have made some changes and updates. Examples of relational databases for geographic information include PostGIS [8], WebGIS [9], Oracle 19c [10], the Microsoft Azure SQL Database [11], and the SQL Server [12]. These relational databases can define geospatial objects, support the main spatial data types (for geometry), and adopt different indexes for fast spatial queries (Binary Tree in SQL Server, Binary Tree, R-Trees, and Generalized Search Tree in PostGIS). Additionally, most applications using spatial relational databases are desktop systems (such as ArcGIS) or have map server software (such as GeoServer). Traditional relational databases adopt fixed structure/data schemas, and their scalability is limited.
NoSQL databases are general distributed database systems, which may not require structured data, are typically designed for scaling horizontally, and may be open source [5,13]. For horizontal scalability, NoSQL databases do not provide the standard ACID properties (atomicity, consistency, isolation, and durability) that are provided by relational databases. However, NoSQL databases exhibit the ability to store, manage, and index arbitrarily big datasets while supporting a large number of concurrent user requests [14]. Currently, NoSQL databases are now being widely used in various application fields [4,[15][16][17].
With the development of mobile communications, the IoT, and high-speed network access technologies, the need for geographic information applications for mobile services and web services has become increasingly strong [2][3][4]. New geospatial applications require more flexible data schema, a relatively fast query response time, and more elastic scalability than traditional spatial relational databases currently have. For example, when the streaming requests from clients to servers suddenly increase, it might cause significant response delays and service unavailability. To solve this scalability problem, a scalable framework was proposed based on MongoDB to implement elastic deployment for geospatial information sharing with the client users growing in number [14]. In this framework, MongoDB is chosen because it is a distributed database and supports a flexible storage schema suitable for massive map tile storage [14].
Several studies have found that relational database management systems (RDBMS) have some disadvantages in terms of big data storage and queries in some specific areas, such as in high concurrent or large-scale data access environments in geospatial applications [4,5]. In one qualitative comparison of experiments, it was found that document databases have faster response times and line intersection queries than SQL databases when the number of records in the databases is large [5,18]. Other studies have indicated that NoSQL databases have more advantages in geospatial data processing than relational databases [6]. In testing of the most used spatial query functions in different databases, NoSQL databases performed better than did relational databases, especially mobile-GIS and Web-GIS [18,19]. Currently, most NoSQL databases are viewed as not being well designed for geospatial data [6,20]. One of the most obvious deficiencies of NoSQL databases in terms of geospatial data is that NoSQL databases only have basic spatial functions, far fewer than relational databases have [6]. Fortunately, research on NoSQL databases is a burgeoning field, attracting more and more attention from enterprises and academics, and improvements and innovations have rapidly emerged. Moreover, some NoSQL databases have some spatial functions and spatial indexes [18,21]. In recent years, some academic articles have summarized and analyzed the applications of SQL and NoSQL databases in geographic data fields, including one study of geospatial big data [3], a summary of the best practices for publishing spatial data on the web [22], an experimental comparison between two geospatial data platforms [19,23], comparisons between relational databases and NoSQL databases in geospatial applications [5,18], and a study on geospatial semantic data management [24]. However, there have been few comprehensive analyses of state-of-the-art geographic data processing in popular NoSQL databases.
To elaborate on state-of-the-art geographic data processing in popular NoSQL databases, in this paper, we first introduce the geospatial data characteristics and related concepts and then review state-of-the-art geospatial data processing in the 10 most popular NoSQL databases. Moreover, we analyze the pros and cons of these NoSQL databases for geospatial data processing. The paper structure is as follows. Section 2 introduces the geospatial data characteristics and related concepts. Section 3 introduces the research methodology adopted for this paper. State-of-the-art geospatial processing in NoSQL databases is presented in Section 4. Section 5 compares the performances of the different NoSQL databases in terms of geospatial data storage and queries. Section 6 includes a brief conclusion.

Geospatial Data Characteristics and Related Concepts
Before discussing geospatial data processing, here we introduce the basic geospatial concepts and characteristics of geospatial science.
Generally, there are two main ways to represent geospatial data: raster and vector data.
• Raster data are made up of a matrix of cells (grain or pixels), in which a cell has an associated value representing information, such as a brightness value or temperature, and are arranged into rows and columns (or a grid).

•
Vector data consist of individual points that are stored as pairs of (x, y) in 2D cases or (x, y, z) in 3D cases. The points are connected through certain orders/rules to create lines, polygons, surfaces, and solids. In this paper, most of the discussions refer to vector data.
Features and geometries are the two main foundational concepts. A feature can be any object with a given spatial location, such as an airport or a mountain.

•
According to ISO 19109:2015, a feature is defined as the "abstraction of real-world phenomena". Features may have attributes, e.g., spatial attributes giving the location/extent of the feature, thematic attributes giving descriptive characteristics of the feature, and also other kinds of attributes, such as metadata/quality.

•
The geometry is any geometric shape that can represent a feature's spatial attribute, such as a point (0D), line (1D), polygon/surface (2D), or solid/volume (3D). The geometries can be embedded in 1D space, 2D space, or 3D space. The dimension of the geometry must be smaller than or equal to the dimension of the embedded space. For simple cases such as visualization using traditional 2D maps (2D space), points, lines, and polygons might be sufficient for user needs. For more complex cases requiring 3D space, surfaces and volumes/solids are also required.
In relational databases, based on the international standard ISO/IEC 13249: 2016, SQL Multimedia Application Packages provide 27 geometry types, of which 24 geometry types are instantiable and have constructor functions. The geometry types and methods in relational databases are obviously more abundant than in NoSQL databases.
To handle the features' spatial locations and relationships, coordinate reference systems are needed.
• A coordinate reference system (CRS), or a spatial reference system (SRS), is a coordinate-based system for locating geographical entities and establishing their relationships. Popular coordinate reference systems include the geocentric coordinate system, geographic coordinate system (WGS84 datum), Universal Transverse Mercator (UTM), and Cartesian coordinate system.

•
In coordinate reference systems, Well-known Text (WKT) is a text markup language that represents coordinate reference systems and conversions between different coordinate reference systems, as defined by the Open Geospatial Consortium (OGC).

•
The EPSG Geodetic Parameter Dataset (also called the EPSG registry), which was developed by the European Petroleum Survey Group (EPSG) in 1985, is a public collection of the definitions of coordinate reference systems and coordinate transformations. The EPSG code is widely used in geographic information systems and GIS libraries.
Additionally, topological relationships are critical to geospatial processing and data queries. Some issues with instance relationships can be solved through features' topological relationships. In addition to the traditional graph topology, there are three popular topological relation principles: implicit topology as in simple features [25] and point-set topology as in Egenhofer nine-intersection relations [26] and in RCC8 relations [27]. Battle et al. summarized the equivalence between the three spatial relations, as shown in Table 1 [28]. The special characteristics of geospatial data (especially multi-dimensionality and the large size of datasets) make processing them different than processing other data. These data require different platforms, flexible scalability, and the ease of modification, update, and query, such as large-scale spatiotemporal query scenarios [29] and huge access requests based on mobile platforms [14].

The Research Methodology
Many NoSQL databases and related products are developed and updated continually currently. To compare and discuss the geospatial development of NoSQL databases effectively, the 10 most popular NoSQL databases (as ranked by DB-Engines [30]) were chosen to analyze their characteristics and performances in terms of geospatial data processing. These NoSQL databases fall into six types of database models: document databases, graph databases, wide column databases, key-value databases, multi-model databases, and search engine, as listed in Table 2. We used the name of each of the 10 databases and "geospatial OR spatial" as search keywords to search for articles in general academic databases, including the Web of Science Core Collection, Google Scholar Citations, and the Scopus database. After filtering some duplicated and unrelated articles, we obtained our final set of articles; how many were found in each database is shown within round brackets in Table 2. Through an analysis of NoSQL databases and related research, we explored the state-of-the-art of geospatial information processing in NoSQL databases. Few NoSQL databases support 3D data scenes, so if there is no special annotation, the geometry functions and indexes indicate 2D data scenes.

State-of-the-Art Geospatial Processing in NoSQL Databases
In this section, basic information about these databases is shown in Table 3. After that, some succinct information about the geospatial characteristics in these databases is introduced. Additionally, related research from the different databases is also summarized.

MongoDB
MongoDB is a document database that stores data in scalable, flexible, JSON-like documents, with different data fields and a changeable data structure. MongoDB does not support a declarative query language: queries in MongoDB are built and issued by proprietary API or drivers. MongoDB supports storing and querying geospatial data. To describe GeoJSON data, MongoDB uses an embedded document with a GeoJSON object type and then the object's coordinates, listing the longitude first and then the latitude:  Table 4. Additionally, MongoDB uses the WGS84 reference system for geospatial queries of GeoJSON objects. Valid longitude values are between −180 and 180 (inclusive), and valid latitude values are between −90 and 90 (inclusive).
MongoDB stores object location data as legacy coordinate pairs and supports spherical surface calculations via a 2dsphere index, of which there are two data representations: an array (MongoDB preferred) and an embedded document in legacy coordinate pairs: An array: <field>: [ <x>, <y> ] An embedded document: <field>: {<field1>: <x>, <field2>: <y> }. MongoDB provides two geospatial index types: geohash for 2dsphere and 2d. Within 2dsphere indexes, relevant queries are implemented through a calculation of geometries on an Earth-like sphere. Within 2d indexes, queries are implemented through a calculation of geometries on a two-dimensional plane. For spherical queries, 2dsphere indexes should be used, because the use of 2d indexes for spherical queries may lead to incorrect results. Four topological query operations are provided in MongoDB for geospatial data: $geoIntersects, $geoWithin, $near, and $nearSphere.
In a quantitative comparison of geospatial big data processing between the PostGIS and MongoDB databases, MongoDB had some advantages with its "within" and "intersection" queries [18,31] and in terms of its response time for loading big geospatial data [32]. Meanwhile, in a comparison between ArcGIS and MongoDB, the spatial retrieval performance of MongoDB was better than that of ArcGIS, and this advantage was more obvious with an increase in the point set [33].
In related research on MongoDB, attention has focused on geospatial data management and storage [34][35][36][37][38][39][40][41][42][43] and on index development methods [29,[44][45][46]. Using MongoDB, several platforms and frameworks have been designed to solve diverse application demands. Although NoSQL databases can store JSON objects, a standard query language is still missing, so individuals who are not programmers have a hard time managing, analyzing, and correlating geospatial data. To solve these problems, a framework and a query language were designed to manipulate JSON objects and provide spatial and non-spatial operations across heterogeneous datasets [34]. To provide a multi-user collaborative work environment, a prototype system was developed based on open-source web technologies, in which geospatial data were processed according to the OGC standard and modified as a GeoJSON format to be stored in MongoDB [43]. When multiuser requests to servers increase to a certain extent, response times and service might be subpar or unavailable. For this scale problem, a scalable Web Map Tile Services (WMTS) framework was designed with a high-performance cluster to implement elastic deployment as user requests grow in number [14]. Aside from the problems of multiple user requests, the management of very large geospatial datasets is also an urgent problem. A software, called GeoRocket, has been created to manage very large geospatial datasets in the cloud [36]. GeoRocket splits large datasets into chunks and processes chunks individually. GeoRocket has adopted Elasticsearch to index and query large datasets and uses MongoDB for data storage [36].
Although MongoDB provides index methods, research is still being done to find faster indexes. Xiang et al. proposed a method of implementing an R-Tree index, which combines spatial range query and nearest neighbor query in MongoDB [44]. Using a tabular document structure, they flattened the R-Tree index into MongoDB collections, and the experiment showed that the new method performed better than the 2dsphere index (MongoDB's built-in spatial index) [44]. The other method used for R-Tree in MongoDB was proposed by Li et al., while a geohash-based spatial index has been applied in location-based queries for a medical monitoring system, which combined nested minimum boundary rectangles (MBRs), an R-Tree as a global tree for real-time locations, and a geohash-based B-Tree as a local tree for historical data [45]. Additionally, some researchers have also made contributions using MongoDB to the spatiotemporal index design of massive trajectory data [29,47].
MongoDB provides a wide and flexible platform for different geospatial applications. Moreover, MongoDB adopts some tactics to improve performance and availability, such as asynchronous replica updates and load balance across replicas, but these tactics can affect the consistency of one or multiple objects. This also happens with other NoSQL databases, such as HBase, Cassandra, and Neo4j.

Couchbase
The Couchbase Server is an open-source, distributed, document-oriented database with fast key-value storage and a powerful query engine for executing an SQL-like query language (N1QL) [48]. The Couchbase Server is designed for some specific environments to provide low-latency data management services, such as a large-scale interactive web, a mobile terminal, or IoT applications. The Couchbase Server supports some geometry primitives, as does MongoDB. For geospatial queries, it has two location representation models: radius-based and box-based. In the radius-based location representation model, location data are shown as locations with longitude-latitude coordinate pairs and the distance in miles. This distance is the length of the radius, and the location of an object is in the center of the circle. If the query location is within the circle, documents are returned. In the box-based location representation model, two longitude-latitude coordinate pairs are required, which are located at the top-left and bottom-right corners of a box. If the query location is within the box, JSON documents are returned, and they contain the location within the box. The Couchbase Server provides R-Tree indexes for location-aware applications. Additionally, spatial indexes can also be defined by users before a geospatial query. Depending on which of the two location representation models are used (a location or a bounding box), the spatial indexes are different. Couchbase only provides queries based on location coordinate data, which can limit its applications.
There was one article about the Couchbase database in the search results, which is mentioned in Section 3. In it, an information-centric network was adopted to federate MongoDB and Couchbase databases [35]. The functional architecture of the designed federated database included a federation front-end for effective connection between a query processor and users; the query processor for interacting with local and remote DBSes; and a DBMS adapter for translating the federated query into the local query language in a local DBMS [35].

Neo4j
Neo4j is the most popular graph database and uses Cypher as its query language, but it supports only one type of spatial geometry, Point, in the latest Version 3.5. Each point in Neo4j can have 2D or 3D presentation and can be specified as a geographic coordinate reference system or a Cartesian coordinate reference system. Because Neo4j has only one spatial geometry type, the database provides spatial functions related to the point, such as distance ( There is a utility library called Neo4j Spatial that facilitates the spatial manipulation of data. Neo4j Spatial supports seven common geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection and topology operations. Additionally, Neo4j Spatial adopts an R-Tree index for spatial queries and provides multiple spatial procedures [50].
In related research, Sarwat et al. implemented a reachability query with a spatial range predicated on the Neo4j graph database, trying to find whether an input point can reach any spatial point that lies within an input spatial range [51]. Because of the lack of optimization of spatial predicates in existing graph query processors, Sun et al. proposed a query operator, GEOEXPAND, which adds spatial data awareness into a graphics DBMS to execute graph queries with spatial predicates efficiently [52]. Additionally, a carpool matching system was designed to recommend carpools based on vehicles' weekly frequent trajectories [53]. A time series of locations in a trajectory was connected while building a time tree through the use of the GraphAware Neo4j TimeTree library. For spatial data, the Neo4j Spatial library was used to model and query trajectory data. Using a carpool matching strategy, the efficiency and efficacy of the proposed system were evaluated [53]. Neo4j has also been applied in agriculture and animal husbandry applications, for example, using web technology and a Neo4j shell to evaluate the condition of the crops on the basis of geospatial data [54] and identifying relations between the members of a cattle herd based on spatial and graph databases [55].
Because Neo4j only supports the point geometry type and point-based spatial functions, it is generally used in location-related applications.

Apache Cassandra
Apache Cassandra is an open source, distributed, wide column storage database management system. Cassandra and the Cassandra Query Language (CQL) do not support spatial queries. Cassandra's main method for supporting geospatial data is Stratio's Lucene Index for Cassandra, a plugin for Apache Cassandra, which supports geospatial data indexing (point, lines, polygons, etc.), geospatial transformations (bounding box, centroid, convex hull, buffer, union, difference, intersection), and geospatial operations (intersects, contains, etc. In Cassandra, the main geometry objects include Point, LineString, and Polygon, while Cassandra itself does not provide a spatial index. Cassandra is a highly scalable and high-performance data store, but it provides limited capabilities for data analyses and limited scalable functions, including a lack of adequate support for spatial data operations. To surmount these problems, research into an extension for the Cassandra Query Language was developed to implement spatial queries in the Cassandra database [56]. This extension research mainly converted the latitudinal/longitudinal values into a numeric geohash attribute and associated them with the data during data storage operations. Then, a spatial query parser and spatial syntax were designed and defined as a CQL spatial extension. After that, an aggregation algorithm was executed to reduce the search space and optimize the sub-queries sent to the cluster nodes. The stored data can be indexed through a geohashing technique [56]. Moreover, a novel approach used to couple Cassandra with Secondo DBMS was proposed by Nidzwetzki et al. to support all DBMS functions, including models of spatial and moving object data with high availability and scalability [57]. Using that work, Nidzwetzki et al. further expanded this approach [58] to build a DBMS that was distributed, general-purpose, fault-tolerant, and parallel. Similarly, to solve issues with spatial queries in the Cassandra database, a framework was developed by integrating of Hadoop and Cassandra for spatial query data stored in Cassandra [59]. The experimental results showed that a user-defined partitioning technique, called prefix-based partitioning, performed better in a geospatial search than did Cassandra's default partitioning algorithm [59]. For similar purposes, another framework combining Spark and Cassandra was proposed to provide data loading and data retrieval solutions for spatial data [60]. This framework includes a spatial data storage layer (based on Cassandra), a Spark core layer (using standard Spark core APIs), a spatial data processing layer (as an interface to query spatial and non-spatial data), and an application layer, in which a Spark-Cassandra connector [61] provides seamless integration between Spark and Cassandra [60].
Cassandra provides limited geometry types, geometry indexes, and functions, so extra design workings or tools/components are required in geospatial data processing, such as an index extension [56] and the combination with other tools [59][60][61].

Apache HBase
Apache HBase is an open source, distributed, versioned, and disk-based architecture database. HBase does not support a declarative query language, and queries in HBase are achieved through proprietary APIs. HBase does not have special geospatial functions to support geospatial data storage and querying [62]. However, researchers developed some methods and applications for processing geospatial data in HBase [63][64][65][66][67][68][69][70], such as a geographical database with geohash-based spatial indexes [63,71], big spatial data processing with Apache Spark [72], a geospatial data model [64], and a new spatial query method based on primary keys' indexing [73]. Additionally, an open source suite of tools, GeoMesa, was designed to implement large-scale geospatial analytics and querying in the cloud or in conjunction with the HBase and Cassandra databases [74].
Besides the above-mentioned research about geospatial processing in HBase, there has been other research related to geospatial processing. The first is based on the MapReduce mechanism [75,76]. Hadoop MapReduce is a software framework through which designers can easily write applications to process huge amounts of data in-parallel in large clusters of commodity hardware [77], while HBase uses base classes to support MapReduce jobs with HBase tables [76]. The central idea of MapReduce is that massive datasets are first divided into independent chunks stored in the clusters, and these chunks are processed by matching different tasks and methods in parallel [75,76].
Other researchers have designed a new index structure to manage data for HBase [69,70,78,79]. Du et al. proposed a novel hybrid index structure to organize data by developing a statistical grid-based R-Tree for indexing space and by using a Hilbert curve for neighbor finding [78]. In HBase, a novel spatial index structure with geohash encoding was designed [69,80]. Additionally, Jo et al. developed a hierarchical index structure for effective spatial query processing in HBase, called a Q-MBR (quadrant-based minimum bounding rectangle) tree [79]. Through Q-MBR, the space is split into quadrants, and MBR is created in each quadrant. Then, the spatial objects are accessed through an index tree in a hierarchical manner. Based on the Q-MBR tree, different algorithms have been designed for different query operations [79].
The third research direction is to build storage models/schemas of spatial data in HBase [67,73,[81][82][83]. Wang et al. proposed a Z storage schema with row keys based on a Z curve for massive spatial vector data in HBase [67]. After that, Zhang et al. improved this Z storage schema, and their experiments showed that the Z storage schema had a higher spatial query efficiency than did a tree-based storage schema (Quadtree storage and R-Tree storage schema) [83].
Furthermore, Zhai et al. combined the distributed HBase database and a global subdivision grid to manage data effectively: with their method, grid geocodes presented the spatial position of an object and were regarded as a key-value in HBase [81]. Meanwhile, Zheng et al. considered spatial adjacency and proposed a spatial data storage optimization strategy for the HBase database: their method stores adjacent spatial objects in the same data fragment [84].
Because HBase does no support geometry types, geometry indexes, or functions, extra design workings or tools/components are required to process geospatial data.

Redis
Redis is an open source (BSD licensed), in-memory key-value database that supports multiple data structures, including strings, hashes, lists, sets, bitmaps, geospatial indexes with radius queries, and streams. Redis implements queries through specific APIs and provides six geospatially related commands: geoadd, geodist, geohash, geopos, georadius, and georadiusbymember. These commands are easy to implement in geospatial data operations. For example, the geoadd command format is "GEOADD (set name) latitude longitude (object name)", and the user adds the specified geospatial object (latitude, longitude, name) to the specified key. Data are stored in the key as a sorted set, and in this way, the object can be retrieved using a query of the radius with the georadius or georadiusbymember commands.
An example is:

GEOADD building 15.45244 -76.78506 my-house
To delete a member from the Geo Set, Redis provides the ZREM command:

redis.zrem (building, my-house)
Redis is an in-memory, but persistent on disk database. When an important change in the data is generated, it is required to instruct Redis to save the change to disk. Additionally, Redis has a limited ability to create relationships between data objects.
Due to its characteristics, Redis is generally used for quick response tracking systems, such as ship tracking [16] and public transportation vehicle tracking [85], in which real-time data need to be displayed or processed in a timely manner, while the snapshot memory data are not required to be stored immediately.

Amazon DynamoDB
Amazon DynamoDB is a NoSQL database providing fast and predictable performance with seamless scalability, document storage, key-value storage, and low-level APIs (protocol-level interface) for managing database tables and indexes. In order to easily build location-based applications, the Geo Library for Amazon DynamoDB was designed, so that a GeoPoint (with a latitudinal value and a longitudinal value) is encoded in a GeoJSON string. Further, geohash indexes for fast location-based queries are used, including box queries and radius queries. Moreover, DynamoDB provides multiple geospatial operations functions, such as GeoPoint, putPoint, deletePoint, updatePoint, queryRectangle, and queryRadius. No related academic articles were found on geospatial processing based on research on DynamoDB.

Elasticsearch
Elasticsearch is a search engine and Document database that was developed in Java. Elasticsearch provides a query language, Domain Specific Language (DSL), based on JSON to define queries and supports. It also supports two Geo datatypes, including a Geo-point datatype (latitude and longitude pairs) and a Geo-shape datatype, which supports Points, Lines, Circles, Polygons, MultiPolygons, etc. In terms of queries, Elasticsearch has four functions: geo_shape, geo_bounding_box, geo_distance, and geo_polygon, and provides multiple PrefixTree, including GeohashPrefixTree and QuadPrefixTree. Additionally, geospatial data can be represented using either GeoJSON or Well-Known Text (WKT) format. An example of GeoJSON is shown here: POST /example/_doc { "location" : { "type" : "point", "coordinates" : [-77.03653, 38.897676] } } We found some articles on Elasticsearch, including one on the usage of Elasticsearch for queries and the storage of local geographic information [86] and one on Elasticsearch as-a-service in Elastic's Elastic Cloud [87]. Elasticsearch is also used for indexing and querying big geospatial datasets in the GeoRocket system [36].

Splunk
Splunk Inc. has many software products with powerful search and analysis abilities for enterprise data management. One of them is Splunk Enterprise, which can fetch data from websites and the IoT (Internet of Things) and has excellent data management and mining performances. Splunk's query language is called the Splunk Process Language (SPL). In Splunk software, geospatial lookups should be used first to generate queries, and the query results are illustrated by a choropleth map visualization. A geospatial lookup maps event/object location coordinates in a geographic feature collection, called a Keyhole Markup Zipped (KMZ) file or a Keyhole Markup Language (KML) file. A format for creating a geospatial lookup is defined below: [<lookup_name>] external_type = geo filename = <name_of_KMZ_file> feature_id_element = <XPath_expression> Splunk Enterprise provides two geospatial lookups for the United States and for other countries. No related academic article was found on geospatial processing based on Splunk applications and research.

Solr
Solr is an open-source search platform with high reliability, scalable indexing, search functions, fault tolerance, and load-balanced querying. Its default query parser is the "Lucene" query parser. To store spatial data, Solr supports the WKT and GeoJSON format, but the data format needs to be designated through a "format type name" before data can be stored. There are two inner parameters: f for the field name and w for the format name.
An example is: For geospatial data queries, Solr provides indexing points or other shapes, searching results from a bounding box or circle or other shapes, sorting research results in terms of distance, or even boosting results in terms of distance. Moreover, Solr supplies four main field types for spatial searches, including:

1.
LatLonPointSpatialField: this is most commonly used for latitude-longitude point data; 2.
RptWithGeometrySpatialField: for indexing and searching for non-point data (it can do points, as well, but it cannot do sorting/boosting); 3.
LatLonType (now defunct): it still exists, but has been replaced by LatLonPointSpatialField.
Here The parameters mean the following: d is the radial distance, usually in kilometers; pt is the center point using the format "latitude, longitude"; sfield is a spatial index field; the geofilt filter retrieves results based on the geospatial distance (circle distance) using a given point as the center of a circle and d as the radius; the bbox (bounding box) filter uses the bounding box of the geofilt circle.
Because the bounding box is loose, some stores that are actually more than 5 km away may be found, but the geofit is accurate at 5 km.
As a back-end server, Solr has been used to index and search metadata services in the index node of the Earth System Grid Federation (ESGF), which can access distributed geospatial data [88]. In terms of geospatial data processing, Solr is widely used as a research engine, rather than for data storage [89,90], so combining it with other databases is required.

Comparisons of Geospatial Data Processing in NoSQL Databases
To compare 10 different NoSQL databases, their geospatial features are summarized in Table 4, which includes geometry primitives, the main geometry functions, spatial indexes, query language, and data format. In Table 4, it can be seen that nine out of the 10 databases support geospatial data and have special functions or procedures, except for HBase. However, perhaps due to the absence of geospatial features and geospatial functions in HBase, geospatial research is often done in the HBase database.
In terms of geometry objects, most NoSQL databases support multiple geometry objects, except for Amazon DynamoDB, HBase, Neo4j, and Redis. DynamoDB and Redis only support the point object, while the HBase database does not support any geometry objects. Neo4j also only supports the point geometry object, but an extended library, Neo4j Spatial, supports seven geometry objects. MongoDB, DynamoDB, and Elasticsearch support more comprehensive geometry functions than do other NoSQL databases and contain the operations of point, distance, and range. For spatial indexing, the geohash is the most common method and has been adopted in document databases, column-oriented database, and key-value databases. The tree structure is another common structure employed for indexing spatial data in NoSQL databases. In terms of query language, some NoSQL databases do not support a declarative query language, including MongoDB, HBase, Redis, and Amazon DynamoDB. They support REST queries and a proprietary API for building and issuing queries. In terms of supported geospatial data formats, eight out of 10 databases provide general GeoJSON or WKT data formats, except for HBase and Splunk.
In NoSQL databases, data models (including multi-models and search engines) can be classified into four major categories: key-value, graph model, wide column, and document stores. In fact, all of these data models can handle and manage geospatial data. However, different NoSQL databases can store and represent geospatial data in different ways according to the specific data model.
Graph databases are based on nodes (0D) and edges (1D), similar to the graph topology model relevant for spatial data. Two nodes and an edge (connection between the two nodes) can expediently represent two road crossings and a road between the two crossings in a topology relationship. However, graph databases do not support graph topological faces (2D), i.e., the space limited by an edged. Since Neo4j is a graph database, it can natively handle the 0D and 1D graph topological properties of geospatial data and can provide fast traversal operations [21].
In key-value mode (Redis), a historical building (building name) and this building's location (longitude and latitude), history, and construction information can also be easily stored in Redis (the building name is a key, and the other information is the stored values). The key-value database is an in-memory store where data loading and workload execution are incredibly fast [13]. Due to its in-memory store, the Redis database is generally applied in specific systems that require the illustration of data in real time and do not require the persistent storage of all data, such as in ship tracking [16].
However, because of the characteristics of these two data models, Neo4j and Redis mainly support the point geometry object. Therefore, key-value databases (Redis) [7,91] and graph databases (Neo4j) [53][54][55] only provide limited geospatial queries and functions, including distance calculations and location queries. Restricted geospatial data models hinder the applications for which key-value databases (Redis) and graph databases (Neo4j) are useful due to the complexity of geospatial data, especially that of polyline and polygon objects.
Document databases can differ in the details, but all document databases encode and encapsulate information into documents in a certain standard format. The common standard encoding formats include Extensible Markup Language (XML), JavaScript Object Notation (JSON), and Binary JSON (BSON). For geospatial data, the document databases use GeoJSON format, such as MongoDB, Couchbase, Amazon DynamoDB, and Elasticsearch.
Document databases have complex relationships with other NoSQL databases. For example, the search engine Elasticsearch provides ample operations for documents and is considered to be a document-oriented database. Additionally, a document database can sometimes be viewed as a key-value database, such as Redis, Couchbase, and Amazon DynamoDB. MongoDB is not a key-value database, but it uses the concept of key-value pairs, and documents are accessed using a key. Although document databases are intimately related to key-value databases, document-oriented databases, such as MongoDB, Couchbase, and Elasticsearch, process and manage geospatial data more effectively than do key-value databases [7]. This is mainly because document databases have more flexible queries for retrieving geospatial data than do key-value databases, including proximity queries and embedded topology analysis functions [17], and through the GeoJSON format, many document databases easily support or extend geospatial data management. Furthermore, document databases perform well in geospatial data queries [18,31], spatial data retrievals [33], and in terms of response times for loading big geospatial data [32]. MongoDB also has the best query time for node queries compared to Neo4j and PostgreSQL [92].
The wide column databases, Cassandra and HBase, store data tables in columns instead of in rows, and they are open-source, non-relational, distributed databases. HBase does not support geospatial processing; however, through MapReduce [75,76,93] and by designing new index structures [69,70,78,79] and storage models/schemas [67,73,[81][82][83], the HBase database can now process geospatial data for different applications. As with Hbase, Cassandra can also use Hadoop MapReduce for geospatial data processing [59]. However, HBase is a column-oriented key-value data store, and Cassandra is essentially a hybrid between a key-value and a tabular database management system. Neither supplies a way to query by column and value, and query performance mainly depends on limited keys, so the column-family databases can be used efficiently for some special geospatial applications that need simple geospatial queries, mass data insertion, and fast data retrieval [21]. Additionally, most research based on HBase has focused on vector spatial data [66,67,83,93,94], while document databases can handle raster data [7] and vector spatial data [36]. Because wide column databases do not have sufficient functions and queries to support geospatial data processing, another weakness of wide column databases (HBase) is that they require extra work for geospatial indexes and functions design [69,78,94]; this may cause limitations in the interoperability and sharing of designs compared to the indexes and functions of built-in databases. The spatial indexes and functions of the built-in database provide convenience and efficiency in design works, but fixed indexes and functions might limit their flexibility in some applications. Designers and developers must balance the convenience and flexibility of a design project, as well as considerations of the workload and complexity of the design.
Furthermore, geospatial indexing is vital for geospatial queries in NoSQL databases. For better query performance, some researchers have extended current indexing methods for different NoSQL databases, including R-Tree for MongoDB [44,46], geohash extensions for MongoDB [45,47], a graph-based expansion tree (GET) for Neo4j [95], and a new hybrid indexing scheme called HB+-trie for key-value storage [96].
Currently, document and wide column databases receive more academic attention than do graph and key-value databases in terms of geospatial data processing. The research on document databases has mainly concentrated on index improvement [44,45], performance analysis [18,23], and practical applications [38][39][40]. Additionally, due to the high performance of data insertion and retrieval in HBase, many researchers have designed systems and applications for geospatial data based on HBase [63,73,79,82,94].
A basic comparison of the geometry objects, main geometry functions, spatial indexes, and data formats supported by these NoSQL databases is shown in Table 4. A summary of geospatial data processing in different NoSQL databases (based on our literature review and analysis) is listed in Table 5. Of the NoSQL databases, document databases handle geospatial data processing the most effectively, considering the geometry objects they support, their data formats, their query performance, their geospatial functions, their index methods, and the amount of academic attention they receive. The other databases have their own advantages for specific scenarios.

Data Models Main Characteristics in Terms of Geospatial Processing Main Applications Academic Attention
Key-value database 1. Fast data loading and workloads execution 2. In-memory storage and specific application scenario 3. Limited geospatial queries and functions Tracking applications Low

Conclusions
In this paper, we summarized the state-of-the-art geospatial data processing used in the 10 most popular NoSQL databases and compared their performances based on geometry objects supported, geometry functions, spatial indexes, data formats, query languages, and use in academic research. Moreover, we analyzed the pros and cons of these NoSQL databases in geospatial data processing. Graph databases and key-value databases tend to express the geometry point object, without enough support for other geometric structures. This limits their geometric functions and applications. Moreover, these two types of databases have received little academic attention in terms of geospatial data processing. Document databases support a variety of geometric structures and provide a richer set of geospatial functions than do graph and key-value databases. Wide column databases only support a limited number of geospatial queries and functions; however, wide column databases have been adopted for many applications and have been studied extensively by academics, as have document databases. On the basis of our literature review, which included a systematic comparison of NoSQL database characteristics, we conclude that document databases are the best platform for geospatial data processing, as they load fast and have a good execution time, good query performance, and abundant geospatial functions and index methods. They have also received much academic attention.
Depending on the application scenario, graph databases, key-value databases, and wide column databases also have their own advantages. Additionally, geometry surface calculations and volume processing are not handled in the existing NoSQL databases. This could be a new direction for spatial processing research.
Author Contributions: Resources, methodology, and data collection, Dongming Guo; formal analysis and investigation, Dongming Guo and Erling Onstein; writing, original draft preparation, Dongming Guo; writing, review and editing, Dongming Guo and Erling Onstein. All authors have read and agreed to the published version of the manuscript.
Funding: NTNU Open Access publishing funds covered the article processing charges.

Conflicts of Interest:
The authors declare no conflict of interest.