DSTree: A Spatio-Temporal Indexing Data Structure for Distributed Networks

: The widespread availability of tools to collect and share spatial data enables us to produce a large amount of geographic information on a daily basis. This enormous production of spatial data requires scalable data management systems. Geospatial architectures have changed from clusters to cloud architectures and more parallel and distributed processing platforms to be able to tackle these challenges. Peer-to-peer (P2P) systems as a backbone of distributed systems have been established in several application areas such as web3, blockchains, and crypto-currencies. Unlike centralized systems, data storage in P2P networks is distributed across network nodes, providing scalability and no single point of failure. However, managing and processing queries on these networks has always been challenging. In this work, we propose a spatio-temporal indexing data structure, DSTree. DSTree does not require additional Distributed Hash Trees (DHTs) to perform multi-dimensional range queries. Inserting a piece of new geographic information updates only a portion of the tree structure and does not impact the entire graph of the data. For example, for time-series data, such as storing sensor data, the DSTree performs around 40% faster in spatio-temporal queries for small and medium datasets. Despite the advantages of our proposed framework, challenges such as 20% slower insertion speed or semantic query capabilities remain. We conclude that more significant research effort from GIScience and related fields in developing decentralized applications is needed. The need for the standardization of different geographic information when sharing data on the IPFS network is one of the requirements.


Introduction
With the advancement of Internet-based services and sensors, such as the widespread adoption of GPS (Global Positioning System)-based sensors, location-based services, improvements in computational data processing, and satellite imagery, a large amount of novel spatial information is produced daily [1].The widespread availability of tools to collect and share spatial data enables individuals and small communities to produce their own digital spatial content (e.g., Open Street Maps contributions have tripled between 2012 and 2017 [2]).This enormous production of spatial data requires scalable data management systems [3].The data infrastructure technology supporting spatial data management and processing has changed from standalone relational database systems to spatial data warehouses that support a variety of data formats and analytical workloads [4] and from centralized infrastructures to decentralized and peer-to-peer (P2P) systems.In addition to the new system architectures, new data structures have been proposed such as, e.g., HDF [5], Data-Cubes [6], Geoparquet [7], and spatial data management and analytic frameworks (e.g., Apache iceberg [8], Digital Earth [9]) have emerged to manage and perform analysis on large-scale, high temporal and spatial resolutions.
Geospatial architectures are another area of technological change developed to handle spatial data management challenges.They have changed from clusters to cloud architectures and more parallel and distributed processing platforms (e.g., Spark [10], Hadoop [11]).
In the realm of network and application architecture, P2P systems have been established in several application areas.In the past decade, P2P systems have been used widely in web3 [12], blockchains [13], and crypto-currencies [14].The combination of P2P file-sharing systems with blockchains provides scalability, security, immutability, and append-only attributes for sharing information amongst nodes on a network [15].These attributes make P2P networks suitable for content distribution and service discovery applications [16][17][18][19].However, the main limitation of existing systems is they can only locate data on the network based on a key value using DHT [20].While DHTs have been used as a main building block for P2P applications, they are seriously deficient in one regard; they only directly support exact match queries and do not allow users to query data by range requests [21][22][23][24].The multi-dimensionality of spatio-temporal data is one of the big challenges in retrieving and querying these data in P2P networks.When querying spatio-temporal data, the ability to perform range queries is required, and it is not currently supported.The rapid increase in spatio-temporal data collection needs a new auxiliary indexing structure.These indexing structures are responsible for tracking the behavior of moving objects through space [25,26].These indexing methods allow P2P architectures to find and retrieve contents based on the user's filters and address more complex data sharing needs, facilitating data management, query processing, and delivering data to the end-user.
While much work has been carried out towards expediting search in file-sharing P2P systems, issues concerning spatial indexing in P2P systems are significantly more complicated due to cases such as overlaps between spatial objects, avoidance of data scattering, and the complexity of spatial queries [27].One-dimensional data mainly have been queried using DHTs in P2P networks [28][29][30].DHTs are not yet designed for complex spatial queries (e.g., range query, k-nearest neighbor query), and only support the location of data items based on a key value (i.e., equality lookups) [20,31].For multi-dimensional data, there have been multiple approaches; the first approach includes partitioning data into one-dimensional indexes using space-filling curves or kd-tree-based methods and indexing them using DHT methods (e.g., [32][33][34]).These methods usually work best with static data, and dynamic content relocation (locality) is dependent on the accuracy of the space-filling curves [34].
Another approach to apply range queries to the P2P networks is to use multiple DHTs in the network (e.g., [35][36][37]).This method needs a higher level of network and node structure manipulation and is not commonly used in large-scale projects due to interoperability limitations of this approach [38].A third approach is to construct traditional indexes that have been used in centralized environments and distribute these indexes on the P2P network (e.g., [20,21,39,40]).This approach is implemented by constructing a tree and splitting it into parts and maintaining parts of semi-independent trees at each peer.Prefix Hash Tree (PHT) [21] and, similarly, P-Tree [20] use the same approach by storing a fraction of the overall tree on each peer.In PHT, each node of the tree is labeled with a prefix which is defined recursively.Given a node with the label l, its left and right children are labeled as l0 and l1, respectively.This pattern constructs a tree structure and enables range queries on a dataset [21].
In this work, we propose a spatio-temporal indexing data structure that works in the data layer and uses a distributed InterPlanetary File System (IPFS) network.Our method is closer to the approach of Ranabhadran et al. [21] and does not require additional DHTs to perform multi-dimensional range queries.The rest of this paper is divided into two main sections.First, we introduce the Distributed Spatio-Temporal Tree (DSTree) as a data structure to perform range spatio-temporal queries on a dataset and we check its features and performance by comparing it to other existing trees.Second, we will look at the integration of DSTree with distributed networks and propose a system architecture to perform queries on the IPFS using DSTree.The originality of this work includes the ability of the spatio-temporal queries on the data shared on IPFS system.This sort of query allows sharing and querying spatio-temporal data on P2P systems without the need for any third-party central entities.

Spatio-Temporal Data Indexing Methods
P2P multidimensional query processing refers to the execution of advanced query operators over multidimensional data stored in a distributed system [41].The retrieved data from a query should be exact and complete.Exactness means the query result should not be approximate.The retrieved data should exactly belong to the query results set.This means that if we run a query on the same dataset on a centralized system, the results should be exactly the same as when we run a query on the distributed system.A basic element of geospatial technology can be defined as three main components including location in space and time, and attribute of that location in space-time [42].In this work, our focus is to address a methodology to allow users to retrieve a geographic object based on space or time queries from a P2P network.
The spatio-temporal indexing methods in central systems are mainly performed at an abstract level (Figure 1).They are used to improve query performance on the datasets.These indexes usually work as a separate layer on top of the data layer itself.Queries are usually performed against the indexes and then the actual reference to the geographic features are then retrieved from the index.In contrast, our method stores data on the nodes on the network using spatial indexing methods in both abstract and physical storage layers.For example, the purpose of indexing can also be to distribute data that are closer in space, on the nodes that are closer in the network [20] (Figure 1, middle).Through the last couple of decades, many spatio-temporal access methods have been developed.There have been several approaches for spatio-temporal indexing so far [25,26,[43][44][45][46][47][48].Handling temporal data in GIS ranges from time-stamping GIS layers [49], to more object-oriented approaches such as time-stamping events and processes [44,50].Another category of spatio-temporal models is trajectory-based access models which track the changes in the geographic object typologies over time [47].In central systems, indexing models such as Oct-tree [51] are sometimes used to process spatio-temporal queries [47,52,53].In using Octtrees for indexing spatial data, each geographic object is considered as a cuboid.The spatial dimension of the data is considered as two dimensions of each cuboid.The third dimension of a cuboid is considered a time interval.There have been some newer approaches to the query process of spatio-temporal data using blockchains.For example, ref. [54] used a block-DAG-based index traversal algorithm to handle spatio-temporal queries on a block-DAG.However, the main issue with the blockchain-based spatial-temporal indexes is the limited data storage capability on the blockchains [55].The Spatial Index maintenance is handled by RDBMS [56] (Left) or super nodes (Middle) in some studies [20], or using proposed blockchain-based method (Right).In the proposed model, each node maintains a spatial index and the latest version of the index is always published on the blockchain.Each node only stores and serves features that they need using IPFS.Rows show different abstraction levels of geographic information from real-world phenomena to machine language.
A decentralized spatial indexing technique must be scalable enough to be able to handle hundreds of thousands of peers and also dynamic enough to deal with peers joining/leaving the system anytime.Another feature of such structures is their ability to preserve the locality and directionality of multidimensional information.Locality implies that multidimensional information is stored in neighboring nodes, while directionality implies that the index structure preserves orientation.The notions of locality and directionality are very important.If an index structure preserves these properties, then searching in the index corresponds to searching in the multidimensional space, which can highly improve query evaluation cost [27].R-tree-based indexes can efficiently answer various types of multidimensional queries, especially range queries [57].In addition, a spatiotemporal indexing method is required to support two types of topological relations.The first set of topological relations includes temporal typologies which are based on Allen's temporal algebra covered in [58,59].These relations include seven typologies that are briefly explained in Table 1.A time interval for Geographic Information (GI) can be defined as the duration in which a GI feature exists with a fixed state.This interval can be as small as a few milliseconds that it takes to collect a GI feature or can be a considerably longer time period, such as geological land classifications or land cover.Defining time intervals and how a GI can be attached to a newer time interval depends on the context of the study.The second type of topological relations which needs to be addressed by a spatio-temporal indexing model is spatial topology.
Each indexing method is optimized for specific types of queries.Our proposed method is more suitable for the storage, query processing, and retrieval of log-based data.These data are produced over the time that different events happen.An example of such data is data that are being collected from a sensor over time or a VGI tool to share images from different locations by users or even open data which are being shared by different government departments, such as crime data that are being shared by the police.Each of these datasets can have different access levels, and geoprivacy levels are being collected over time.
Table 1.Temporal algebra introduced by [58].X (line border) is time interval of the first GI and Y (dashed border) is the time interval of the second GI.

X in Relation to Y Y in Relation to X Condition
DSTree is a two-level tree structure.The approach of DSTree is close to the work carried out by [60][61][62].They constructed a multilevel tree structure to improve the query process of trajectory data.In the method developed by [60], they formed a global spacetime subdivision scheme.Sun et al. combined two trees, an R*-Tree and a kd-Tree, to improve the query process on the centralized machines [61].Tao [62] also used a series of temporal quad-Trees to handle interval queries in centralized systems.In a DSTree, an Interval-Tree [63,64] is used as the top part of the tree, and a quad-Tree [65] is used at the bottom part of the tree.

Dstree: A Spatio-Temporal Index
A unit of GI in our model is defined as a spatio-temporal object which can be represented in the form of Gid(GI, GI MBR , GI Time-Interval ) where Gid is the identification of the object.GI is geographic location, including longitude and latitude (GI x , GI y ) along x and y dimensions.GI MBR is Minimum Boundary Rectangle (MBR), which is constructed based on its location.GI Time-Interval is the temporal interval the GI is valid during.Each GI also has a set of attributes, p, associated with it.
The interval tree is responsible for temporal queries and the quad-Tree is responsible for performing the spatial part of the queries.We define a time interval as a pair of real numbers Different variants of the interval trees are capable of supporting open and half-open intervals.Interval trees are optimized for querying of intervals which overlap with a given interval, but can also be used for point queries.Having the ability to query overlapping intervals allows us to query based on the temporal topology in Table 1.During each time interval, we assume that the GI state does not change.

Dstree Index
When constructing a DSTree, it is possible to partition data into spatio-temporal chunks and assign a unique ID to each portion of the data.The proposed DSTree indexes are composed of two parts, as shown in Figure 2. The first component (A) is called the interval tree index, and the second component (B) is called the quad index.The interval tree is a binary tree so each node can have only two children.By assigning 1 to the right child and 0 to the left child, a series of IDs will be constructed.The length of digits in the interval tree portion of the index equals the temporal level (T).The second section of the DSTree index is a quad index.Each node in a Quad-tree consists of four children.Each child can be assigned an index from 00, 01, 10, and 11 and it can construct the quad index.Figure 3 shows the DSTree index corresponding to quad-tree indexes.

A B
Root/ 11 / 111101 The top part of the DSTree is a regular interval tree.However, the depth of this tree is controlled by a parameter called temporal level (T).T is responsible for balancing between the top-level tree and the bottom-level tree.Once the top interval tree is constructed, the GI in the leaves (each leaf includes n GI) is used to construct a quad-tree.As a result, we will have one quad-tree at each leaf of the interval tree at the level of T. Regular quad-trees always have an extent equal to the minimum boundary box of the GI inserted into the tree.However, in a DSTree, all of the quad-trees should have the same extent, e.g., equal to (−180, −90, 180, 90) in geographic coordinates.Having the same spatial extent, the bottom part of DSTree allows us to query data across all of the interval tree leaves.Figure 3 shows an example of a constructed DSTree.Figure 4 shows the points which are used to construct that tree and their relative location and time interval.

Insert
In order to insert a GI Gid(GI CID , GI MBR , GI Time-Interval ) into a DSTree, a two-step process is required.First, we need to find the proper node on the interval tree part of the DSTree that Gid can be added to.Afterward, we will add the Gid to the proper quad-tree leaf using GI MBR .To do so, we first obtain the low value of the interval at the root of DSTree.If the root's low value is smaller than GI Time-Interval 's low endpoint, then the new interval goes to the left sub-tree; otherwise, the new node goes to the right sub-tree.We continue the same process until the sub-tree level is equal to the temporal level (T) parameter of the DSTree.Once the node is selected, if there is already a quad-tree in the node (node is spatial), we insert the Gid into the quad-tree using its GI MBR parameter.If the node is empty, we generate a new quad-tree and proceed with adding it.Adding Gid to the quad-tree is similar to the regular quad-tree insert (e.g., see [65]).Once all the steps are completed, we update the max value of the ancestors of interval tree portion if needed.Algorithm 1 shows the pseudocode for the process.

Time Complexity of an Insert
The time complexity of inserting a Gid into a DSTree involves several steps: Finding the proper node on the interval tree part of the DSTree: The time complexity of this operation depends on the height of the interval tree portion of the DSTree and the number of nodes visited during the traversal.In the worst case, if the interval tree portion is unbalanced, the time complexity can be O(n), where n is the number of intervals in the DSTree.

2.
Adding the Gid to the proper quad-tree leaf: Once the proper node on the interval tree part is found, adding the Gid to the proper quad-tree leaf involves traversing the quad-tree structure.The time complexity of this operation depends on the size and structure of the quad-tree.In general, the time complexity of a quad-tree insertion can be O(logm), where m is the number of objects in the quad-tree.

3.
Updating the max value of the ancestors of the interval tree portion: After inserting the Gid, the max value of the ancestors of the interval tree portion may need to be updated.The time complexity of this operation depends on the height of the interval tree portion of the DSTree.In the worst case, if the interval tree portion is unbalanced, the time complexity can be O(n), where n is the number of intervals in the DSTree.
Overall, considering all steps, the time complexity of inserting a Gid into a DSTree can be approximated as O(n + logm), where n is the number of intervals in the DSTree and m is the number of objects in the quad-tree.However, the actual time complexity may vary depending on the specific implementation details and the characteristics of the DSTree and quad-tree.

Delete
Deleting GI items from DSTree is a relatively complex process due to the complexities of removing intervals from a regular interval tree.After deleting a Gid from the DSTree, if the node containing that Gid contains no more objects, that node may be deleted from the tree.This involves promoting a node further from the leaf to the position of the node being deleted, which results in the reconstruction of the top part of the DSTree (for details about deleting items from interval trees, see [64], pp.348-357).Algorithm 2 shows this process.

Time Complexity of a Delete
The time complexity of deleting items from a DSTree involves several factors: Finding the node containing the Gid: The time complexity of this operation depends on the structure of the DSTree.In the worst case, if the DSTree is unbalanced, the time complexity can be O(n), where n is the number of nodes in the DSTree.

2.
Deleting the Gid from the node: The time complexity of deleting the Gid from the node depends on the data structure used to store intervals in the node.For interval trees, the deletion process can have a time complexity of O(logi), where i is the number of intervals in the node.

3.
Deleting the node from the DSTree if it contains no more objects: If the node contains no more objects after deleting the Gid, it may need to be deleted from the tree.Deleting a node from a tree can involve restructuring the tree, which can have a time complexity of O(logn), where n is the number of nodes in the tree.
Overall, the time complexity of deleting items from a DSTree can be approximated as O(nlogi), where n is the number of nodes in the DSTree and i is number of intervals.
However, the actual time complexity may vary depending on the specific implementation details and the characteristics of the DSTree.

Query
A spatio-temporal range search is a query of geographic objects that intersect with a boundary box, S = (Min x , Min y , Max x , Max y ), in two-dimensional space and also is in a temporal topological relation, TP, with a time interval, I[t 1 , t 2 ] [66].There can be three main variations of queries of a DSTree, which include queries with both temporal interval and spatial extents, queries with only temporal interval, and queries with only spatial extent.Here, we only cover the first variation.The approach to those two variations is similar and explained in more detail in Figure 5.In order to perform a spatio-temporal range on a DSTree, it is required to first query the interval tree portion of the DSTree.If I is in TP relation with the root's interval, we add the root's interval to the candidate node list.If the left child of the root is not empty and the max value of the sub-tree in the left child is greater than I's low value, recur for the left child; otherwise, recur for the right child.Once the candidate nodes are selected, if the selected nodes have a quad-tree as their sub-tree, we perform a quad-tree search for S; otherwise, we only check for the intersection of the selected node with the boundary box, S. Algorithm 3 shows this process.

Time Complexity of a Query
The query process has three key operations, as follows: 1.
Querying the interval tree portion of the DSTree: This operation involves traversing the interval tree portion of the DSTree recursively.The time complexity of this operation depends on the height of the interval tree portion and the number of nodes visited during the traversal.If the interval tree portion is balanced, the time complexity is O(logn), where n is the number of intervals in the DSTree.

2.
Recursing down the left or right child nodes: This operation involves recursively traversing down the left or right child nodes of each candidate node.The time complexity of this operation depends on the structure of the DSTree and the distribution of intervals.In the worst case, if the DSTree is unbalanced, the time complexity can be O(i), where i is the number of intervals in the DSTree.

3.
Quad-tree search: If the selected nodes have quad-tree sub-trees, a quad-tree search for S is performed.The time complexity of the quad-tree search depends on the size and structure of the quad-tree and the number of objects in the search area.In general, the time complexity of a quad-tree search is O(m + k), where m is the number of objects in the search area and k is the number of objects found.Overall, considering all operations, the time complexity of the algorithm can be approximated as O(logi + m + k), where i is the number of intervals in the DSTree, m is the number of objects in the search area, and k is the number of objects found during the search.

Dstree Construction from Bulk Data
Since DSTree is a two-level tree, the performance of the insert, delete, and construction of the tree depends on both the top level and bottom level of the trees.Construction of the DSTree from scratch for bulk data is a relatively straightforward process.First of all, an interval tree is constructed based on the existing data.Once data are inserted into the hierarchical node structure, the algorithm traverses down from the root of the interval tree until the tree level equals the temporal level (T).Once the appropriate level is detected in the interval tree, all of the items under sub-trees (left and right branched) of the selected node are collected into one single node and a quad-tree is constructed in that node.

Performance Metrics
The behavior of the proposed tree structure is measured using a number of experiments with real data.These experiments are intended to reflect the conditions of common tasks involved in spatial and temporal queries of the data.Results of multiple models are compared to the method proposed.In the first set of the experiments, data on the occurrence of crime from the Waterloo Regional Police Service (https://www.wrps.on.ca/en/aboutus/reports-publications-and-surveys.aspx,accessed on 15 February 2024) are used.These data detail all the police-reported occurrences for the calendar year.The time frame of these data is from 2017 to 2022.The data include occurrence data and time, response time, and geographic coordinates of the occurrence.In this experiment, only the location of each event and the response time of each event are used.In order to have a comparison between DSTree and the other existing models, we have used three other methods to process spatio-temporal queries.In choosing each method, the availability of source code to perform the tests was considered.Oct-tree is one of the methods which is used in the experiments similar to the work carried out by [47,52,53], in which the third dimension of cuboid data is considered as the time interval (source code available at [67]).The second access method used is a regular quad-tree (source code available at [68]).In this method, only a spatial query is performed, and then, to obtain the exact result set, all the results from the spatial query are tested to filter based on the temporal parameters.The third method is an interval tree (source code available at [69]).In this method, in contrast to the quad-tree method, only a temporal query is performed, and once the results are extracted from the tree structure, the spatial filter is applied to them to obtain the exact results.The above experiment is applied to the batches of 50 k, 100 k, 200 k, 400 k, 800 k, and 1 million points.Each test was executed five times then repeated ten times, and the average time of the execution was measured.The spatio-temporal query was constant over the entire experiment.The main query for these different models is defined as follows: Find the events where their location has an Intersection topology relation with an extent equal to and their response time has a topological relation T with a temporal extent of where Max x , Min x , Max y , Min y are the spatial extent of the entire dataset and Max t , Min t are temporal extent of the data and T is the temporal topology from Table 1.
In Figure 6, the results of the six temporal queries are discussed.In these six queries, static spatial extent is used.As Figure 6 shows, DSTree shows a good performance for query processing in medium-sized datasets compared to other types of indexing methods.The metrics of the DSTrees in all of the six temporal topologies are close to the quad-tree method.Oct-tree and interval tree also show close performance metrics.
The second experiment is with the number of visited points in order to answer a constant spatio-temporal query.In the experiment, only objects that the index traverses and are checked to answer each query are counted.Figure 7 shows the results of this experiment.In this figure, DSTree has fewer visited points compared to quad-tree.The reason why they have close metrics is that the quad-tree is less computationally heavy compared to DSTree.As Figure 7 shows, the DSTree has visited less points than oct-tree and oct-tree suddenly flattened out after 800k points.This is probably related to the spatial distribution of the points on the area which are clustered and results in a better performance of oct-tree.Further studies are required to address this kind of edge case.However, considering this number of visited points, the time to create an oct-tree is 20% higher, which is discussed in the next section.In addition, since the oct-tree uses functions which involve the third dimension comparison, the query performance does not out-perform DSTree overall.

Ipfs
The first generation of P2P systems, namely file-sharing applications such as BitTorrent, support only keyword lookups and mostly provide no load balancing.The second generation is mainly structured P2P systems supporting basic key-based routing [31].The InterPlanetary File System (IPFS) is a protocol and a P2P network for storing and sharing data in a distributed file system.IPFS uses content addressing to uniquely identify each file in a global namespace connecting all computing devices [70].Each file on the IPFS network has a unique hash address which is used as a reference to request it from the network.Any user in the network can serve a file by its content address, and other peers in the network can find and request that content from any node that has it using a distributed hash table (DHT) [71].
At its core, IPFS is built on top of a data structure called InterPlanetary Linked Data (IPLD) [70].The IPLD model is a set of specifications in support of decentralized data structures for the content-addressable web.Content IDs (CIDs) are hashes generated to allow the user to interact with IPFS in a trustless manner and recover their data.IPLD deals with decoding these hashes so that users can access their data.When new content is added to the IPFS network, that content is separated into several chunks and stored in different blocks.To reconstruct the whole file, a Directed Acyclic Graph (DAG) connects each bit of content together.In a DAG, we can only move from parent nodes to child nodes as each edge is oriented.Hierarchical data in particular are very naturally represented via DAGs.
IPLD creates a series of links to data internally but also allows users to create those links themselves through simple data structures that can be stored on IPFS.This capability allows us to store a DAG graph (in our case, a DSTree) on IPFS.This capability allows users to request a portion of the data from the network without the need to download the entire dataset.For example, a user is able to store a graph, shown in Figure 3, as an IPLD object, as shown in Table 2. IPLD's capability to store and retrieve DAG graphs allows us to store spatio-temporal data as a graph structure, and as a result, we can request them based on the query parameters.DSTree only keeps a CID reference to the actual feature in each tree leaf.So depending on the query parameters, we only need to retrieve a portion of the GI or specific subtree of the DSTree.In an IPLD, each graph node is separated using /.For example, a DSTree leaf can be represented as an IPLD hash as DSTree CID /Interval-tree index /quad-tree index / where DSTree CID is the CID of the root of the DSTree, Interval-tree index is the interval tree portion, and quad-tree index is the second portion of the DSTree index.Under each leaf, there will be a series of GI objects.Each GI can be stored separately on IPFS, and its own CID, GI CID , can be used as a reference to the object itself.So to access a single feature, we can use an IPFS address similar to DSTree CID /Interval-tree index /quad-tree index /GI CID .Note that GI CID is generated based on the content of the GI by IPFS, and to have access to it, we need to query it from DSTree.Table 2.An example of how IPFS stores a DAG graph (based on the graph in Figure 3) and how to request a portion of the graph from the IPFS.Qmb...R is DSTree CID which is a hash generated based on the root content of entire DAG graph from the DSTree Qmb...d or Qmb...c is a GI CID , which is a hash generated based on the content of single GI object.

Distributed Network Integration
In order to process queries on distributed networks (in our case IPFS), it is required to store data in a DAG format.We use DSTree to construct a DAG graph, and once the tree structure is constructed, an IPLD graph is formed from the DSTree graph.Then, the IPLD object is stored on the IPFS network and a CID of the uploaded contents is used as a root gateway to access the entire tree structure.DSTree in this system acts as the main index structure to perform and answer spatio-temporal queries.Since DSTree is an indexing structure, it does not store the actual GI.It only stores the CID of each individual GI as a reference to the object itself.This provides a lightweight graph that can be used by each client.

Data Management
As mentioned earlier, the DSTree only stores a CID reference to the actual GI.The GI can be in any format (e.g., geojson, topojson, or other feature-level standards).In constructing the DSTree, each GI is first uploaded on the IPFS as a regular file or IPLD object.Then, for each GI, we will then have a set of (GI CID , GI MBR , GI Time-Interval ).This object is then inserted into a DSTree and the related DAG graph is then constructed.Once all the GI objects are added to the DSTree, the graph structure is converted to an IPLD object and is uploaded on the IPFS and the pair of (DSTree CID , DSTree Metadata ) will be shared between users.

Metadata
Metadata are usually defined as data about data [72].In order to provide interoperability between different systems, it is required to include metadata objects within shared content.In the DSTree, DSTree Metadata is used to store information related to the dataset and can include general spatial metadata objects (e.g., see [73][74][75]).The following metadata keys (Table 3) are necessary in order to provide minimum interoperability when sharing information using DSTree on IPFS.The proposed method to process queries on IPFS networks consists of four main components.Figure 8 shows the flow of the communication between users on a distributed network in order to query, retrieve, and store GI.The start of the data sharing process on IPFS is with a user, User1, willing to share a GI, (GI, GI MBR , GI Time-Interval ), on the network.Once the user uploads the GI content on IPFS, they use its CID, MBR, and time interval associated to it, (GI CID , GI MBR , GI Time-Interval ), to construct a DSTree.In this step, the user is able to keep adding as many GI objects as they want to the DSTree.Once the construction of the DSTree is finished, the necessary metadata are also added to the data structure, and an IPLD object, DSTree IPLD , is constructed from DSTree's DAG graph.The DSTree IPLD is then uploaded on the IPFS network and the related IPFS root hash, DSTree CID , and its metadata, DSTree Metadata , are retrieved from IPFS.Since the DSTree CID is generated by IPFS based on the content of the DAG graph, we need to share this CID with other users to be able to access the index.In order to share the IPFS hash with other users, a smart contract is used.This smart contract is responsible for keeping a history of DSTree CID hashes over time and providing the latest DSTree CID hash to the users at any time.In our example, a simple smart contract using Solidity is developed and deployed on the Ethereum test network.This smart contract is used in a web application in order to provide access to the latest version of DSTree CID hash and its metadata when a user visits the web application.Once the DSTree CID hash is added to the smart contract, it will be available to all the users who connect to this smart contract.If a new user, User2, accesses the smart contract, then they will be able to fetch metadata and the IPLD DAG graph structure related to the DSTree.Then, they will be able to replicate a version of DSTree on their own local environment, and as a result, they are able to query data from that index.The results of the query, an array of GI CID s, are then requested from the IPFS through the query process explained in Figure 5.
If User2 wants to add a new GI, (GI2, GI2 MBR , GI2 Time-Interval ), to the dataset, they first add it to the DSTree and then construct a new IPLD graph and upload the data on IPFS.Since the content of the new DSTree is different from the previous one, a new IPFS hash is generated and the DSTree2 CID is returned to the user.In the next step, User2 connects to the smart contract and adds DSTree2 CID as a new block to the underlying blockchain.At this point, all the users will be able to access the updated data, DSTree2 CID , throughout the blockchain.The older version of the DSTree, DSTree CID , will also remain on the block history of the blockchain and will be accessible too.

Discussion
Unlike centralized systems, data storage in P2P networks is distributed across network nodes, providing scalability and no single point of failure.However, managing and processing queries on these networks has always been challenging.The proposed method to share and query spatio-temporal data on distributed networks tackles this issue by tracking and updating a spatio-temporal index between network users.In this approach, a blockchain is responsible for keeping a history of different versions of a DSTree index.Each user can replicate a version of DSTree on their node and run spatio-temporal queries on the index.Since each user performs the queries on their side, the indexing tree should check fewer items and support more topology out of the box.As shown in Figure 7, the number of visited items in the DSTree is generally less than other indexing structures, providing less memory consumption on the client apps.While octree also visits fewer nodes/items during its query process, its tree construction is slower ( 20%) compared to the other tree indexes.Also, it takes more time to answer the queries (see Figure 6) since the internal intersection functions are three-dimensional.They can also grow quickly if the time intervals are large [47].In a single interval tree or quad-tree approach, the results of the queries need to be checked for the spatial or temporal topologies accordingly during the post-processing stage.In addition, the DSTree can six main temporal topological relations (see Table 1) during the query process without the need to post-process data.Time-wise, the DSTree also performs well on the small to average datasets (see Figure 6).Update, insertion, and deletion of existing data is another requirement for the current data-sharing environments.Due to using an interval tree as the top part of the tree, DSTree is not optimized for deleting items.Inserting a new GI will only update a portion of the tree structure and not impact the entire DAG data graph.However, adding data with large intervals so that the GI temporal interval covers branches from the left to the right side of the interval tree can cause a restructuring of the entire tree.For example, for time-series data such as sensor data or VGI, the DSTree performs better.In our police department example, the newly reported incidents can be added on top of the DSTree, and it does not cause the restructuring of the entire tree.
Conflicts may appear during the update process of the DSTree and publishing the latest version of DSTree by users.To resolve such conflicts, there are several approaches.Conflict resolution between different versions of DSTree can be achieved either on the client side before pushing the latest version on the blockchain or on the smart contract before saving DSTree CID .Both of these methods require a mechanism to detect and address conflicts.Because of the size limitations on the smart contracts, we are using a client-side conflict detection approach.In this approach, the DSTree graph is extracted from the latest version available on the blockchain and is compared with the version of the graph on the client side [76].Supposing the conflicts in the DSTree graph are detected (using tools like [77]), the user will need to resolve the conflict and then publish it on the blockchain.However, this approach needs a trustful user interaction with the network.
The reason for using quad-trees is that since the root boundary box of all the quad-trees is constant, the quad index part of the DSTree index will always point to the same area in the geographic space over different time intervals.This provides a faster access method and the capability to exchange the quad-tree with a Discrete Global Grid System (DGGS).DGGS grid, similar to quad-tree, provides the same index value per each grid cell in the space.It also provides methods to aggregate data on multi-resolution levels [78][79][80] and also provides built-in data locality and directionality of space [81].For instance, in sharing police department information over time, the reports can be censored using distributed k-anonymity methods on the P2P networks (e.g., [82]) if the DGGS system is used as feature data storage.

Data Locality in the IPFS with DSTree
Data locality implies that neighboring multidimensional information is stored in neighboring nodes [31].In an IPFS network, once each node downloads some particular content, it can be a data provider.Combining data partitioning using DSTree and sharing data on IPFS at the GI level allows users to request only a small portion of the entire dataset.As a result, they can also serve small chunks of a large dataset.This can provide a data locality in the P2P network based on user activities, e.g., on VGI platforms.In our police data sharing example, once the dispatch teams share their location on the network, they have already become a data provider for that shared GI.Once the police officers visit that location, once again, they download that GI, and they will become another data provider for that specific GI on the IPFS network.This approach provides a level of data locality for the nodes close to each other since the nodes in a region usually tend to explore data related to their region.Other examples of use cases of such data locality include sharing geo-tagged information in small communities.

DSTree Limitations
The proposed DSTree model only supports the spatial topologies that the underlying spatial indexing method supports.In this paper, we only experimented with the intersection topology relation.However, it is possible to perform other topological relations such as overlay, within, and crosses and also perform KNN-based models.All of the temporal typologies are supported except disjoint.The focus of this paper was support for vectorbased data structures.However, supporting raster data could be achieved by converting raster data into DGGS-based models or tiling the raster data instead of generating lowerlevel spatial index using the multi-resolution tiling structure.Table 4 summarizes supported queries and future approaches to support other data models.In this work, our focus was on spatio-temporal queries.However semantic queries play an important role in data retrieval.The potential of combining DSTree with models such as STKST-I [83] would allow querying semantic data, along with providing a wider range of temporal topologies.

Conclusions
P2P has become very popular decentralized approach for storing and sharing information.The daily amount of spatial data being collected and shared in different sectors with different levels highlights the need for P2P data management, query, and processing.This paper proposes a new spatio-temporal multi-level tree structure, DSTree, which aims to address this problem.DSTree is capable of performing a range of spatio-temporal queries.To integrate this data structure on the IPFS distributed network, a framework that uses blockchain to share the IPFS CID of the index is proposed.Each user is capable of replicating DSTree and querying or updating it.However, this model is not optimized for deletion and is mainly suitable for append-only data over time.In this work, some of the challenges in sharing and querying spatio-temporal data on distributed networks are addressed.Despite the advantages of our proposed framework, challenges remain.We con-clude that more significant research effort from GIScience and related fields in developing decentralized applications is needed.The need for the standardization of different feature types and feature type properties when sharing data on the IPFS network is one of the requirements.The possibility of using IPLD objects in sharing GI at the feature level can provide finer access to the information.In addition, it is necessary to address attribute-level query processing, which is not covered in the current work.The use of the smart contract to control access of the users to read and write data to the main chain can also be studied.This access control can even be at the feature level.

Figure 1 .
Figure1.Abstraction levels of geographic information.Spatial indexes cluster units of Geographic Information (GI) at the abstract level and it is used at the storage level in the different architectures.

Figure 2 .
Figure 2.An example of DSTree Index.Each index is constructed of two sub-parts.Part A is the interval tree index.Its length is equal to the temporal level (T).Part B is the quad-tree index.

Figure 3 .
Figure 3.An example of DSTree constructed from a set of sample points.Each DSTree has a temporal (interval tree) component and a spatial (quad-tree) component.The final graph will be a stack of quad-trees on top of each other.Spatial level is the number of levels in the quad-tree.

Figure 4 .
Figure 4. Spatial and temporal location of the points in Figure 3. On the right side, the DSTree-Index related to each section of the graph is listed.

Figure 5 .
Figure 5. Three main scenarios to process queries using DSTree.Top: When a user requests only a spatial range in which we search all the quad-trees in the DSTree.Middle: When the user queries spatial and temporal ranges together, DSTree first queries interval tree part of the graph and then searches quad-trees that exist at the bottom of those candidate nodes.Bottom: Cases where user only provides a temporal range.As a result, we only search interval tree part of the DSTree and then simply query the root of the quad-tree in each candidate node.

Figure 6 .
Figure 6.The query processing time for different spatio-temporal access methods.The spatial and temporal extent of each query remained constant.The results are measured for 6 different temporal topologies.

Figure 7 .
Figure 7. Number of visited points in each model to answer a spatio-temporal query.

Figure 8 .
Figure8.Query process and publishing spatio-temporal data on the IPFS.It shows the process of sharing the DSTree index between users using a blockchain, querying the content from the network by another user, and updating the data on the network.

Algorithm 2
Delete algorithm for the proposed DSTree function DELETEGIDFROMDSTREE(Gid, GI CID , GI MBR , GI Time-Interval ) current_node ←QUERYGIDFROMDSTREE(Gid, GI CID , GI MBR , GI Time-Interval ) if current_node is found then // remove the Gid fro current_node if current_node is empty after deletion then // Delete the node from the tree // Reconstruct the DSTree if necessary

end if end if end function function QUERYGIDFROMDSTREE
(node) ▷ Find the node which contains the current Gid end function

Table 3 .
Necessary metadata objects required for sharing DSTree on IPFS.

Table 4 .
DSTree capabilities for different spatio-temporal data models.