GeoSOT-Based Spatiotemporal Index of Massive Trajectory Data

: With the rapid development of global positioning technologies and the pervasiveness of intelligent mobile terminals, trajectory data have shown a sharp growth trend both in terms of data volume and coverage. In recent years, increasing numbers of LBS (location based service) applications have provided us with trajectory data services such as tra ﬃ c ﬂow statistics and user behavior pattern analyses. However, the storage and query e ﬃ ciency of massive trajectory data are increasingly creating a bottleneck for these applications, especially for large-scale spatiotemporal query scenarios. To solve this problem


Introduction
With the development of sensor technologies and with the pervasiveness of intelligent mobile terminals, movements of humans and objects are increasingly being recorded as trajectories.Such trajectories offer a considerable amount of useful data that can be explored.In recent years, a broad range of LBS (location-based service) applications have been developed and improved through trajectory data mining [1], such as traffic flow statistical analysis [2], vehicle scheduling strategies [3], movement pattern analysis [4] for individuals or groups of moving objects, making sense of trajectories, and other applications of urban services.These applications significantly benefit common individuals, traffic organizations, and government agencies.
Before mining trajectory data, one must access different samples of trajectories or different parts of a trajectory many times.This stage is referred to as trajectory data indexing and retrieving [5], and it constitutes a fundamental step of many trajectory data mining tasks.In the case of tourist activities, for instance, for an application seeking to globally analyze activities performed by tourists during their stay in Beijing, the whole track left by tourists in Beijing is extracted (spatial constraints "inside Beijing").In addition, to analyze what tourists do in one day in Beijing or what they do on specific days (e.g., on weekends), each daily track of the tourists must be extracted (temporal constraints "from Saturday to Sunday") [6].For user similarity measurements, most applications usually select several trajectory segments in particular spatiotemporal units to model the structure of human mobile behavior [7].Obtaining trajectories within a specific period of time and space is essential for these applications to conduct further research.The problem is that the large amount of data leads to long query periods, which reduces the efficiency at which applications can mine data and provide corresponding services, especially for online applications requiring the instant mining of trajectory data (e.g., detecting traffic anomalies).Since query efficiency can be improved to some extent through the establishment and use of reasonable indexes, we can design an appropriate index to solve these problems and to allow more time for data mining and analysis than for the query phase.
In accounting for the dynamic characteristics of trajectory data and the strong demand for spatiotemporal range queries, we contend that an optimal spatiotemporal index must meet the following requirements: (1) The index must treat the spatial and time dimensions equally.For many platforms that provide trajectory data management and analysis services, spatial and temporal restrictions are placed on data queries.When faced with massive data, it is necessary to ensure that filtering by time and spatial restrictions is not performed in two operations.It is best to filter three dimensions (latitude, longitude, and time) at a time.(2) The index should not be sensitive to high-frequency data updates.Considering the dynamic nature of moving objects, the index should not be updated with the object's movements at all times, as otherwise maintenance costs would be too high.(3) The index should be easy to integrate into existing data management systems.While the method proposed in [8,9] can establish corresponding effective index structures for a query problem to facilitate query processing, it is extremely complicated and difficult to integrate into an existing DBMS (database management system).We believe that a good spatiotemporal index should be "generalized" and easy to integrate and should not overwhelm the existing data organization scheme.
In this paper, we propose a novel index to solve the efficiency problem of spatiotemporal range queries.The indexing method extends the GeoSOT spatial subdivision model to the time dimension by combining the principle of spatial partitioning with time partitioning.Based on the GeoSOT spatial subdivision model, we applied the principle of time division and constructed a new spatiotemporal indexing model called the GeoSOT spatiotemporal index (GeoSOT ST-index).An effective processing method for large-scale spatiotemporal range queries is also introduced.After processing, the range query for trajectory data in three-dimensional space (time and 2D space) is converted into a one-dimensional code query in the GeoSOT index table.The rapid retrieval of trajectory data is achieved.

Related Work
To improve the efficiency of spatiotemporal retrieval, a common solution is to design and construct an appropriate index structure to greatly reduce the search space.Existing spatiotemporal indexes can be classified into two categories: object-oriented indexes and spatial-partitioning-based indexes.
Object-oriented spatiotemporal indexing technology mainly focuses on the deformation and expansion of two-dimensional spatial index structures.These indexes use individual trajectories' minimum bounding boxes (MBBs) as an indicator.These indexes usually have an R-tree-like structure, such as 3D R-tree [10], SETI [11], SEB-tree [12], and TB-tree structures [13].These types of indexes are trajectory oriented.Regardless of how long a trajectory lasts or how large a spatial region it covers, an MBB is created for it.When the amount of data quickly increases, considerable overlapping and coverage between MBBs results, and index performance and query efficiency decrease significantly [14].In addition, trajectory-oriented indexes must be updated as trajectory data are updated, leading to high maintenance costs [15,16].
Spatial-partitioning-based indexes have been widely employed in recent years due to their simplicity and include the quadtree [17] and grid indexes [18].Such indexes divide space in advance so that when moving objects keep moving within the same spatial unit, the index does not need to be updated.When faced with high-frequency concurrent operations, space-partitioning-based indexes are far superior to R-tree family indexes [19].Zheng [20] proposed the GAT (grid index for activity trajectories) for organizing trajectory segments and activities hierarchically.The indexing method employed involves dividing the whole space into a d-grid (2 d × 2 d grids) and further constructing a (d -1)-Grid, ..., 1-Grid.An inverted index [21] is built from the d layer to the upper layer to record the grid through which the moving object passes.Wang [22] proposed a spatiotemporal index for a road network based on quadtree.Adopting the space-first strategy, the geospatial space is first constructed based on the quadtree.The geographic hash code, i.e., the GeoHash value, is designed for spatial regions of different levels.Then, each Geohash code is combined with a time index to form a spatiotemporal composite index.Yang [23] developed GCOTraj, which partitions a large spatiotemporal data space into multidimensional grid cells and orders these grid cells in two different ways.The index performs well when dealing with large-scale spatial range queries.However, when the time query range is as large as the spatial query range, it performs poorly, as the time index is timestamp-based and thus only stores the start and end time stamps for each page.As increasing numbers of mobile terminals record trajectory data, the amount of data increases sharply over time.Thus, they are not suited for large-scale querying in time and space.As a spatial-partitioning-based index model, the GeoSOT (geographical coordinate subdividing grid with one dimension integer coding on 2 n -tree) has been widely used for spatial data indexing and retrieval [24][25][26][27][28].The above works proposed means of organizing and retrieving spatial data based on GeoSOT, but they only encode spatial information for the object without encoding its time information.In [29], a multiscale time partition and integer coding (MTIC) method is proposed.When using this method to index the time information of an object, high efficiency queries of multiple scales can be achieved.

GeoSOT Global Subdivision Model
The GeoSOT global subdivision model is a spatial indexing model based on spatial partitioning.With the intersection of the prime meridian and equator taken as the central point, GeoSOT recursively divides the surface of the earth into four grid cells [30].Through quadtree recursive subdivision, these grid cells form a multilevel grid of latitude and longitude.This approach is novel in that the original surface space is expanded from 180 • × 360 • to 512 • × 512 • , and 1 • (1 ) is expanded from 60 (60") to 64 (64").The result is a one-dimensional integral grid code on the 2 n -tree [31].Figure 1.shows the GeoSOT subdivision and encoding procedure.
trajectories) for organizing trajectory segments and activities hierarchically.The indexing method employed involves dividing the whole space into a d-grid (2 d ×2 d grids) and further constructing a (d -1)-Grid, ..., 1-Grid.An inverted index [21] is built from the d layer to the upper layer to record the grid through which the moving object passes.Wang [22] proposed a spatiotemporal index for a road network based on quadtree.Adopting the space-first strategy, the geospatial space is first constructed based on the quadtree.The geographic hash code, i.e., the GeoHash value, is designed for spatial regions of different levels.Then, each Geohash code is combined with a time index to form a spatiotemporal composite index.Yang [23] developed GCOTraj, which partitions a large spatiotemporal data space into multidimensional grid cells and orders these grid cells in two different ways.The index performs well when dealing with large-scale spatial range queries.However, when the time query range is as large as the spatial query range, it performs poorly, as the time index is timestamp-based and thus only stores the start and end time stamps for each page.As increasing numbers of mobile terminals record trajectory data, the amount of data increases sharply over time.Thus, they are not suited for large-scale querying in time and space.As a spatial-partitioning-based index model, the GeoSOT (geographical coordinate subdividing grid with one dimension integer coding on 2 n -tree) has been widely used for spatial data indexing and retrieval [24][25][26][27][28].The above works proposed means of organizing and retrieving spatial data based on GeoSOT, but they only encode spatial information for the object without encoding its time information.In [29], a multiscale time partition and integer coding (MTIC) method is proposed.When using this method to index the time information of an object, high efficiency queries of multiple scales can be achieved.

GeoSOT Global Subdivision Model
The GeoSOT global subdivision model is a spatial indexing model based on spatial partitioning.With the intersection of the prime meridian and equator taken as the central point, GeoSOT recursively divides the surface of the earth into four grid cells [30].Through quadtree recursive subdivision, these grid cells form a multilevel grid of latitude and longitude.This approach is novel in that the original surface space is expanded from 180° × 360° to 512° × 512°, and 1° (1′) is expanded from 60′ (60′′) to 64′ (64′′).The result is a one-dimensional integral grid code on the 2 n -tree [31].Figure 1.shows the GeoSOT subdivision and encoding procedure.
Each grid cell of each level is encoded by a geographical code according to the Z-order filling curve [32].The code exhibits the following characteristics: (1) Uniqueness: Each grid cell has a globally unique code corresponding to a rectangle on the Earth's surface.(2) One-dimensional: GeoSOT uses a one-dimensional number or binary bit string to represent a region of two-dimensional space.(3) Recursiveness: The GeoSOT lower-level grids are divided by the upper-level grid, essentially resulting in spatial Z-order filling curve padding.The four GeoSOT grid cells of a "Z" belong to the same parent grid, and their binary codes have the same prefix.Each grid cell of each level is encoded by a geographical code according to the Z-order filling curve [32].The code exhibits the following characteristics: (1) Uniqueness: Each grid cell has a globally unique code corresponding to a rectangle on the Earth's surface.(2) One-dimensional: GeoSOT uses a one-dimensional number or binary bit string to represent a region of two-dimensional space.
(3) Recursiveness: The GeoSOT lower-level grids are divided by the upper-level grid, essentially resulting in spatial Z-order filling curve padding.The four GeoSOT grid cells of a "Z" belong to the same parent grid, and their binary codes have the same prefix.

GeoSOT-Based Spatiotemporal Index Model
Trajectory data are typical spatiotemporal data and present dynamic characteristics.Even over one second, a large number of mobile terminals update their locations.Thus, we consider extending the GeoSOT spatial partitioning model to a spatiotemporal partitioning model.With this approach, the index will not be updated as long as the moving object does not exceed the space-time unit.
First, we define the scope of the GeoSOT ST-index model: Spatial extent: geographic longitude [−256 As noted above, the spatial extent is consistent with GeoSOT.When selecting a time period, we mainly consider the following two points.The first point relates to the timeliness of the trajectory data.Since GPS trajectories have only been produced over the past few decades, we start with the year in which the Internet was first created, i.e., 1969.Second, the GeoSOT system has 2 n integer characteristics (that is, the spatial units of all granularities including degrees, minutes, seconds, and sub-seconds are integers), which requires that the time interval is also preferably of a power of 2. In theory, as long as the number of years is satisfied, a power of 2 can be used as the time range of division (e.g., 128 years, 256 years, and 512 years).However, the longer a time period is, the more subdivisions should be made to obtain the time granularity of a given day or hour.Therefore, a 128-year period (the 7th power of 2) is selected in this paper.As long as seven rounds of subdivision are applied, a one-hour resolution is obtained, fully meeting the needs of many trajectory data mining applications to retrieve data from hour to year intervals.In addition, to maintain integer characteristics of the grid, the time period should also be extended accordingly.The specific expansion scheme used is shown in Table 1.Each time grid includes the integer year, month, day, and hour code.By recursively performing Octree [33] partitioning on the space-time cube defined by the GeoSOT ST-index model, a multilevel cube of latitude, longitude, and time with hierarchical characteristics is attained.The subdivision and coding method of the GeoSOT ST-index model is shown in Figure 2. The temporal and geographical scale of the cube at different levels is shown in Table 2.The encoding method is based on the Z-order three-dimensional space filling curve, and the trend of the filling curve is consistent with that of GeoSOT.We refer to this as the GeoSOT space-time code (ST-code).
ISPRS Int.J. Geo-Inf.2019, 8, x FOR PEER REVIEW 5 of 14 Table 2..The encoding method is based on the Z-order three-dimensional space filling curve, and the trend of the filling curve is consistent with that of GeoSOT.We refer to this as the GeoSOT space-time code (ST-code).For a given level, any trajectory point falls into only one space-time unit whereas a space-time unit can contain multiple trajectory points.The GeoSOT ST-index structure of the trajectory data is shown in Figure 3.When a moving object frequently updates its position within a space-time grid, it will not cause the index to update.A reduction in the number of updates means there are no long locks, definitely improving query efficiency levels. 1 hour 1″ For a given level, any trajectory point falls into only one space-time unit whereas a space-time unit can contain multiple trajectory points.The GeoSOT ST-index structure of the trajectory data is shown in Figure 3..When a moving object frequently updates its position within a space-time grid, it will not cause the index to update.A reduction in the number of updates means there are no long locks, definitely improving query efficiency levels.
When inserting a new trajectory, we must first establish the mapping relationship between the trajectory point and GeoSOT ST-code and then insert the new trajectory into the corresponding code node point by point.When a trajectory is to be deleted, the first step is to convert the geographical location and time information of the trajectory into corresponding GeoSOT ST-codes.Then, the trajectory ID of each code node is removed.Query operation is the focus of this paper.Specific instructions are presented in Section 4.

GeoSOT Space-Time Code for One Trajectory Point
A trajectory point is usually represented as a triplet p = (x, y, t), where x and y represent the latitude and longitude coordinates of the moving object, respectively, and t represents the timestamp at each position.A trajectory is a series of chronologically ordered points, i.e., T = <p1, p2, …, pn> along with a moving object id.The purpose of the GeoSOT spatiotemporal index for trajectory data is to establish the relationship between a grid and trajectory point.The specific steps involved are as follows: (1) The optimum level: The subdivision level can be determined according to specific application requirements.When not specified, the default is the 21st level.One can obtain the temporal scale rtn and spatial scale rsn of the n th level space-time cube from the resolution lookup table.
(2) The decimal value of the longitude, latitude and time grid: First, the latitude and longitude of the trajectory point are converted into a degree-minutesecond format, i.e., D°M'S''.The timestamp is then converted into a year, month, day and hour format, i.e., YMDH.Then, the longitude, latitude, and time are respectively converted into a GeoSOT space-time decimal value with Equations (1-3): When inserting a new trajectory, we must first establish the mapping relationship between the trajectory point and GeoSOT ST-code and then insert the new trajectory into the corresponding code node point by point.When a trajectory is to be deleted, the first step is to convert the geographical location and time information of the trajectory into corresponding GeoSOT ST-codes.Then, the trajectory ID of each code node is removed.Query operation is the focus of this paper.Specific instructions are presented in Section 4.

GeoSOT Space-Time Code for One Trajectory Point
A trajectory point is usually represented as a triplet p = (x, y, t), where x and y represent the latitude and longitude coordinates of the moving object, respectively, and t represents the timestamp at each position.A trajectory is a series of chronologically ordered points, i.e., T = <p 1 , p 2 , . . ., p n > along with a moving object id.The purpose of the GeoSOT spatiotemporal index for trajectory data is to establish the relationship between a grid and trajectory point.The specific steps involved are as follows: (1) The optimum level: The subdivision level can be determined according to specific application requirements.When not specified, the default is the 21st level.One can obtain the temporal scale rt n and spatial scale rs n of the nth level space-time cube from the resolution lookup table.
(2) The decimal value of the longitude, latitude and time grid: First, the latitude and longitude of the trajectory point are converted into a degree-minute-second format, i.e., D • M'S".The timestamp is then converted into a year, month, day and hour format, i.e., YMDH.Then, the longitude, latitude, and time are respectively converted into a GeoSOT space-time decimal value with Equations ( 1)-(3):  (3) The GeoSOT ST-code of one trajectory point The Z curve is applied to transform three-dimensional spatial data into a one-dimensional array and to represent the encoding of cubes [34].The code is simply calculated by interleaving the binary representations of its coordinate values, which call for the same length of binary representations in each dimension.
The above three values (Code Lon , Code Lat , and Code Lon ) are transformed into n-bit binary sequences LatB, LonB, and TimeB, respectively.Then, the values are converted into the corresponding Morton code [35] by interleaving bits of the three binary sequences (see Figure 4).Since the maximum number of subdivisions per dimension is 21, a binary sequence of no more than 63 bits is attained after interleaving.This sequence can be stored as a 64-bit unsigned integer.

Query Processing
Query processing is designed to convert the spatiotemporal query range into GeoSOT spacetime codes.The key task is to select the appropriate level of GeoSOT ST-codes set corresponding to the query range.When the level is too high, the query range may map into many small grids (see Figure 6.a), rendering retrieval conditions too trivial and reducing query efficiency.When the level is too low, the query range may reflect only a small proportion of the subdivision grids (see Figure 6.b), leading ST-code retrieval to involve too much data that does not intersect with the query area [36].Query accuracy cannot, in turn, be guaranteed.To solve the above problem, we propose mapping the query region to one to eight GeoSOT ST-codes to ensure that the actual query range is as continuous as possible and that the query result is as accurate as possible.The main premise of query range processing is to find the minimum partition level N from the scale table, where the space-time cube is just greater than or equal to the query cube.The GeoSOT ST-code of the eight vertices in the query area is calculated at this level.Usually, a query cube may cover 1~8 space-time units of the N th layer.The specific steps involved are as follows: (1) The coverage of the query range along the longitude, latitude, and time dimensions Dis_L = Lmax-Lmin (4) Dis_T = Tmax-Tmin (2) The optimum level

Processing
Query processing is designed to convert the spatiotemporal query range into GeoSOT space-time codes.The key task is to select the appropriate level of GeoSOT ST-codes set corresponding to the query range.When the level is too high, the query range may map into many small grids (see Figure 6a), rendering retrieval conditions too trivial and reducing query efficiency.When the level is too low, the query range may reflect only a small proportion of the subdivision grids (see Figure 6b), leading ST-code retrieval to involve too much data that does not intersect with the query area [36].Query accuracy cannot, in turn, be guaranteed.To solve the above problem, we propose mapping the query region to one to eight GeoSOT ST-codes to ensure that the actual query range is as continuous as possible and that the query result is as accurate as possible.

Query Processing
Query processing is designed to convert the spatiotemporal query range into GeoSOT spacetime codes.The key task is to select the appropriate level of GeoSOT ST-codes set corresponding to the query range.When the level is too high, the query range may map into many small grids (see Figure 6.a), rendering retrieval conditions too trivial and reducing query efficiency.When the level is too low, the query range may reflect only a small proportion of the subdivision grids (see Figure 6.b), leading ST-code retrieval to involve too much data that does not intersect with the query area [36].Query accuracy cannot, in turn, be guaranteed.To solve the above problem, we propose mapping the query region to one to eight GeoSOT ST-codes to ensure that the actual query range is as continuous as possible and that the query result is as accurate as possible.The main premise of query range processing is to find the minimum partition level N from the scale table, where the space-time cube is just greater than or equal to the query cube.The GeoSOT ST-code of the eight vertices in the query area is calculated at this level.Usually, a query cube may cover 1~8 space-time units of the N th layer.The specific steps involved are as follows: (1) The coverage of the query range along the longitude, latitude, and time dimensions Dis_L = Lmax-Lmin ( 4) (2) The optimum level The main premise of query range processing is to find the minimum partition level N from the scale table, where the space-time cube is just greater than or equal to the query cube.The GeoSOT ST-code of the eight vertices in the query area is calculated at this level.Usually, a query cube may cover 1~8 space-time units of the Nth layer.The specific steps involved are as follows: (1) The coverage of the query range along the longitude, latitude, and time dimensions (2) The optimum level Dis_L, Dis_B, and Dis_T are compared, and the maximum value is selected as the granularity δ of the query cube, i.e., δ = max(Dis_L,Dis_B,Dis_H).Assuming that N is the recommended level, the scale lookup table is K as in Table 2, n is a level of table K, and scale(n) is the spatiotemporal scale of the Nth layer.Then, recommended rules for the optimum level are given by Equations ( 7) and (8).
(3) Space-time codes covering the entire query cube After determining level N, space-time codes of eight vertices of the query cube are calculated according to Equations ( 1)-( 3).The GeoSOT space-time codes obtained after de-duplication are then treated as the final retrieval condition.

Performance Results and Discussion
To evaluate the effectiveness and scalability of the GeoSOT spatiotemporal index, the spatiotemporal query method based on the GeoSOT spatiotemporal index is applied to the nonrelational database MongoDB.In the era of big data, nonrelational databases are becoming more widely used.MongoDB, as a document-based nonrelational database, is chosen in this paper because it supports a flexible design pattern and has a rich data format suitable for trajectory data storage.In addition, the built-in spatial index of MongoDB can be used to draw comparisons to our proposed index.
The dataset used for our experiments is a GPS trajectory dataset collected by Microsoft [37,38].This dataset includes approximately 10 million GPS trajectories of 10,357 taxis in Beijing.The MongoDB database was deployed in the single-node mode on a physical server.The server was equipped with one CPU (2 GHz Intel Core i5 CPU), 8 GB of RAM, and a 256 G hard disk.The server ran on the MacOS10.13.2 operating system.
We created a single trajectory data table in MongoDB and imported all of the experimental data.Each trajectory point was treated as an independent document and recorded the following fields: taxi ID, longitude, latitude, timestamp, GeoSOT ST-code, speed, and direction.The GeoSOT ST-index was implemented by establishing a B-tree for the GeoSOT ST-code field.

The Efficiency of GeoSOT ST-Code Generation
This experiment mainly evaluated the space-time code generation time of the 10th~20th level for the same data volume.The experimental results show (Figure 7) that the encoding time tended to increase with the partition level.However, at any level, the conversion of ST-code and trajectory data was completed within 3 microseconds.The encoding time can be neglected in practical applications.
ISPRS Int.J. Geo-Inf.2019, 8, x FOR PEER REVIEW 10 of 14 Table 2., n is a level of table K, and scale(n) is the spatiotemporal scale of the Nth layer.Then, recommended rules for the optimum level are given by Equations (7)(8).
if δ=scale n , n∈K, then N=n (7) if scale n+1 <δ<scale n , n∈K, then N=n+1 ( (3) Space-time codes covering the entire query cube After determining level N, space-time codes of eight vertices of the query cube are calculated according to Equations (1-3).The GeoSOT space-time codes obtained after de-duplication are then treated as the final retrieval condition.

Performance Results and Discussion
To evaluate the effectiveness and scalability of the GeoSOT spatiotemporal index, the spatiotemporal query method based on the GeoSOT spatiotemporal index is applied to the nonrelational database MongoDB.In the era of big data, nonrelational databases are becoming more widely used.MongoDB, as a document-based nonrelational database, is chosen in this paper because it supports a flexible design pattern and has a rich data format suitable for trajectory data storage.In addition, the built-in spatial index of MongoDB can be used to draw comparisons to our proposed index.
The dataset used for our experiments is a GPS trajectory dataset collected by Microsoft [37,38].This dataset includes approximately 10 million GPS trajectories of 10,357 taxis in Beijing.The MongoDB database was deployed in the single-node mode on a physical server.The server was equipped with one CPU (2 GHz Intel Core i5 CPU), 8 GB of RAM, and a 256 G hard disk.The server ran on the MacOS10.13.2 operating system.
We created a single trajectory data table in MongoDB and imported all of the experimental data.Each trajectory point was treated as an independent document and recorded the following fields: taxi ID, longitude, latitude, timestamp, GeoSOT ST-code, speed, and direction.The GeoSOT ST-index was implemented by establishing a B-tree for the GeoSOT ST-code field.

The Efficiency of GeoSOT ST-Code Generation
This experiment mainly evaluated the space-time code generation time of the 10 th ~20 th level for the same data volume.The experimental results show (Figure 7.) that the encoding time tended to increase with the partition level.However, at any level, the conversion of ST-code and trajectory data was completed within 3 microseconds.The encoding time can be neglected in practical applications.

Comparisons of Spatiotemporal Range Query Efficiency Levels
This experiment verified the effectiveness of the GeoSOT ST-index for range queries.We compared the GeoSOT ST-index with the other two composite indexes.One composite method was realized with three independent indexes: a B-tree index on latitude, a B-tree index on longitude, and a B-tree index on time.The other method was realized with two independent indexes: a 2D location index on the longitude and latitude fields (a built-in spatial index in MongoDB) and a B-tree index on time.The spatiotemporal range query performances of the three indexes were compared using the same amount of data.We randomly selected points in Beijing and use them as a north-south corner to generate rectangles of 0.01 • × 0.01 • , 0.02 • × 0.02 • , 0.04 • × 0.04 • , and 0.08 • × 0.08 • .Time intervals were set to 1 h, 2 h, 4 h, and 8 h.We performed 50 query tests on each spatiotemporal range and calculated the average time.The experimental results are shown in Figure 8.

Comparisons of Spatiotemporal Range Query Efficiency Levels
This experiment verified the effectiveness of the GeoSOT ST-index for range queries.We compared the GeoSOT ST-index with the other two composite indexes.One composite method was realized with three independent indexes: a B-tree index on latitude, a B-tree index on longitude, and a B-tree index on time.The other method was realized with two independent indexes: a 2D location index on the longitude and latitude fields (a built-in spatial index in MongoDB) and a B-tree index on time.The spatiotemporal range query performances of the three indexes were compared using the same amount of data.We randomly selected points in Beijing and use them as a north-south corner to generate rectangles of 0.01° × 0.01°, 0.02° × 0.02°, 0.04° × 0.04°, and 0.08° × 0.08°.Time intervals were set to 1 hour, 2 hours, 4 hours, and 8 hours.We performed 50 query tests on each spatiotemporal range and calculated the average time.The experimental results are shown in Error!Reference source not found.. From the results of the effectiveness test, the GeoSOT ST-index performs better than the other two indexes.On average, the proposed index consumes approximately 40% less time than the MongoDB spatiotemporal composite index.In addition, an increase in the query range has less of an effect on the query efficiency of the GeoSOT ST-index and more of an impact on the other two composite indexes.This phenomenon indicates that composite indexes can only filter onedimensional data (space or time) at a time while other dimensions can only be filtered afterward.As the query area expands, a large number of records must still be filtered after first filtering by the primary index.The GeoSOT ST-index reduces the number of three-dimensional queries made in one dimension, filtering latitude, longitude, and time simultaneously.The GeoSOT ST-index is especially good at dealing with range queries of large spatial ranges and long time intervals.

Comparisons of Scalability Performance
This experiment was carried out to evaluate the GeoSOT ST-index's scalability and to demonstrate the extension of query consumption time observed under different data volumes.We randomly selected the time period of interest "2011-03-07 09:29:08"~"2011-03-07 15:01:09" and a rectangular bounding box of interest with coordinates (116.612857°E, 39.856973 °N) from the lower From the results of the effectiveness test, the GeoSOT ST-index performs better than the other two indexes.On average, the proposed index consumes approximately 40% less time than the MongoDB spatiotemporal composite index.In addition, an increase in the query range has less of an effect on the query efficiency of the GeoSOT ST-index and more of an impact on the other two composite indexes.This phenomenon indicates that composite indexes can only filter one-dimensional data (space or time) at a time while other dimensions can only be filtered afterward.As the query area expands, a large number of records must still be filtered after first filtering by the primary index.The GeoSOT ST-index reduces the number of three-dimensional queries made in one dimension, filtering latitude, longitude, and time simultaneously.The GeoSOT ST-index is especially good at dealing with range queries of large spatial ranges and long time intervals.

Comparisons of Scalability Performance
This experiment was carried out to evaluate the GeoSOT ST-index's scalability and to demonstrate the extension of query consumption time observed under different data volumes.We randomly selected the time period of interest "2011-03-07 09:29:08"~"2011-03-07 15:01:09" and a rectangular bounding box of interest with coordinates (116.612857• E, 39.856973 • N) from the lower left corner and coordinates (116.692943• E, 39.937059 • N) from the upper right corner.We fixed this time and space range and conducted query tests under different data volumes.
The results are shown in Figure 9.Under a fixed query range, as the amount of query data increased, the three indexes consumed more query time, but the GeoSOT ST-index showed a more gradual increase in the amount of time consumed.When the amount of data was increased 10-fold, the query time remained stable.This scalability can be attributed to the fact that the essence of spatiotemporal querying with the GeoSOT ST-index is to perform a one-dimensional search on a B-tree, which exhibits O(logN) time complexity.Additionally, since the GeoSOT ST-index is implemented by encoding each track point record and by building a B-tree for the space-time code in the database, space complexity is defined as O(N) where N is the number of track points.
left corner and coordinates (116.692943°E, 39.937059 °N) from the upper right corner.We fixed this time and space range and conducted query tests under different data volumes.
The results are shown in Error!Reference source not found..Under a fixed query range, as the amount of query data increased, the three indexes consumed more query time, but the GeoSOT STindex showed a more gradual increase in the amount of time consumed.When the amount of data was increased 10-fold, the query time remained stable.This scalability can be attributed to the fact that the essence of spatiotemporal querying with the GeoSOT ST-index is to perform a onedimensional search on a B-tree, which exhibits O(logN) time complexity.Additionally, since the GeoSOT ST-index is implemented by encoding each track point record and by building a B-tree for the space-time code in the database, space complexity is defined as O(N) where N is the number of track points.
It can be inferred that when the amount of data increases to one billion or even one trillion, the GeoSOT ST-index will be much more efficient than the other indexes.We thus think our proposed index meets the current requirements of massive trajectory data management applications.

Conclusions and Directions for Future Work
This paper proposes a time-space partitioning-based indexing method for massive trajectory data.This method establishes a global spatiotemporal indexing model by extending the GeoSOT subdivision model.In theory, this method can index trajectory data for small towns to the global scale with a time span of 128 years.We applied the index model to the NoSQL database MongoDB by combining the GeoSOT ST-code and B-tree index.The index has the advantages of being easy to integrate with existing database systems and offering efficient range query capacities through standard SQL.In addition, the cost of the index update is low.When data are updated, one must only add or remove data table records.Through experimental comparisons, we found that both the range query efficiency and scalability of the GeoSOT ST-index are superior to those of the other two spatiotemporal composite indexes, especially when the query scale is large in both space and time.
Future work will focus on more complex queries of the index model proposed in this paper.More comparisons will be drawn to existing solutions, such as ArcGIS and GeoMesa, to better assess the performance of the proposed solution.Author Contributions: Chunyao Qian conceived, designed, and performed the experiments and wrote the manuscript; Chengqi Cheng and Chao Yi supervised the study; and Guoliang Pu offered helpful suggestions and reviewed the manuscript; Xiaofeng Wei revised the manuscript critically; Huangchuang Zhang polished the language.All authors have read and approved of the submitted manuscript, have agreed to be listed and have accepted this version for publication.It can be inferred that when the amount of data increases to one billion or even one trillion, the GeoSOT ST-index will be much more efficient than the other indexes.We thus think our proposed index meets the current requirements of massive trajectory data management applications.

Conclusions and Directions for Future Work
This paper proposes a time-space partitioning-based indexing method for massive trajectory data.This method establishes a global spatiotemporal indexing model by extending the GeoSOT subdivision model.In theory, this method can index trajectory data for small towns to the global scale with a time span of 128 years.We applied the index model to the NoSQL database MongoDB by combining the GeoSOT ST-code and B-tree index.The index has the advantages of being easy to integrate with existing database systems and offering efficient range query capacities through standard SQL.In addition, the cost of the index update is low.When data are updated, one must only add or remove data table records.Through experimental comparisons, we found that both the range query efficiency and scalability of the GeoSOT ST-index are superior to those of the other two spatiotemporal composite indexes, especially when the query scale is large in both space and time.
Future work will focus on more complex queries of the index model proposed in this paper.More comparisons will be drawn to existing solutions, such as ArcGIS and GeoMesa, to better assess the performance of the proposed solution.

Figure 1 .
Figure 1.The GeoSOT subdivision and encoding procedure.Figure 1.The GeoSOT subdivision and encoding procedure.

Figure 1 .
Figure 1.The GeoSOT subdivision and encoding procedure.Figure 1.The GeoSOT subdivision and encoding procedure.

Figure 2 .
Figure 2. The subdivision and encoding method of the GeoSOT ST-index model.Figure 2. The subdivision and encoding method of the GeoSOT ST-index model.

Figure 2 .
Figure 2. The subdivision and encoding method of the GeoSOT ST-index model.Figure 2. The subdivision and encoding method of the GeoSOT ST-index model.

Figure 5 .
Figure 5.The retrieve model based on the GeoSOT ST-index.

Figure 6 .
Figure 6.A comparison of covering the same space-time cube with subdivision grids of different sizes.

Figure 5 .
Figure 5.The retrieve model based on the GeoSOT ST-index.

14 Figure 5 .
Figure 5.The retrieve model based on the GeoSOT ST-index.

Figure 6 .
Figure 6.A comparison of covering the same space-time cube with subdivision grids of different sizes.

Figure 6 .
Figure 6.A comparison of covering the same space-time cube with subdivision grids of different sizes.

Table 1 .
Time division scheme.

Table 2 .
GeoSOT spatiotemporal cube scales of different levels.

Table 2 .
GeoSOT spatiotemporal cube scales of different levels.