Similarity Search on Semantic Trajectories Using Text Processing

: The use of location-based sensors has increased exponentially. Tracking moving objects has become increasingly common, consolidating a new ﬁeld of research that focuses on trajectory data management. Such trajectories may be semantically enriched using sensors and social media. This enables a detailed analysis of trajectory behavior patterns. One of the problems in this ﬁeld is the search for a semantic trajectory database that is ﬂexible and adaptable; ﬂexibility in the sense of retrieving trajectories that are closest to the user’s query and not just based on exact matching. Adaptability refers to adjusting to different types of semantic trajectories. This article proposes a new approach for representing and querying semantic trajectories based on text-processing techniques. Furthermore, we describe a framework, called SETHE (SEmantic Trajectory HuntEr), that performs similarity queries on semantically enriched trajectory databases. SETHE can be adapted according to the aspect types posed in user queries. We also presented an evaluation of the proposed framework using a real dataset, and compare our results with those of state-of-the-art approaches.


Introduction
The proliferation of smartphones, low-cost sensors, and wireless communication devices has enabled the monitoring in geographic spaces of mobile entities such as people, animals, cars, ships, and natural phenomena.Currently, there are several ways to obtain the location of moving objects, with the Global Positioning System (GPS) being the most straightforward and common way to construct raw trajectories [1]. GPS consists of a sequence of geospatial points (latitude, longitude, and altitude coordinates) ordered by timestamps [2][3][4]. Trajectory data are important for analyzing and understanding the behavior of moving objects. For example, trajectory analysis may identify traffic jams, people's behavior patterns, navigation routes, fishing areas, animal migration, and hurricane trajectories [5].
Many studies have enriched trajectory data by including context-based information. Emmanouilidis et al. [6] defined context as a synonym for the range of information that may influence service adaptation. Such information may arise from the environment, user, or other systems. Context-based information enables the enrichment of trajectory analysis and improves the understanding of moving-object behavior [7]. The use of context information can provide insights into the behavioral aspects of mobile objects that would not be possible using only raw trajectories, such as which point of interest (POI) was visited, the type of activities performed, and the trajectory purpose.
Ubiquitous computing and Internet of Things help obtain context-based information [8]. Various devices, such as smartwatches, medical sensors, radio frequency identification (RFID) devices, and environmental sensors, can capture context-based information. Another way of implicitly obtaining trajectory data and context information is by volunteered geographic information (VGI) [9], which consists of geographic data provided by citizens through location-based social networks, such as LinkedGeoData (http://linkedgeodata. org/, accessed on 22 May 2022) and OpenStreetMap (https://www.openstreetmap.org/, accessed on 22 May 2022). In addition, some social media platforms, such as Flickr (https: //www.flickr.com/, accessed on 22 May 2022), Twitter (https://twitter.com/, accessed on 22 May 2022), Facebook (https://www.facebook.com/, accessed on 22 May 2022), and Foursquare (https://foursquare.com/, accessed on 22 May 2022), provide geolocation from their posts. Other behavioral information may be extracted from social media, such as the user's activity and POI evaluation.
Adding context information to trajectory data creates a semantically enriched trajectory, or simply a semantic trajectory [10,11]. In a semantic trajectory dataset, trajectories contain annotations. The waypoints are enriched with information regarding either the environmental or mobile object context, such as the POI name or user heartbeat. An aspect is any type of information that can be annotated to the trajectory POIs. Examples of POI aspects include their name, category, weather, means of transport, and rating [12]. Hence, the trajectory becomes a complex object with several contextual data dimensions associated with its movement [13].
To better understand what semantic trajectories are, consider the example represented in Figure 1, which depicts the short trajectory of a tourist in the city of Pisa in Italy. Each stop is semantically enriched with four aspects: POI name, category, means of transport used to reach the POI, and environmental temperature. The route starts at the POI Cappella dal Pozzo which belongs to the Chapel category. The tourist is walking, and the local temperature is 22 • C. Then, the tourist moves by bus to the Museo delle Sinopie, where the temperature is 21 • C. Finally, the route ends at Teatro Sant'Andrea, where the tourist arrives by taxi, and the temperature is 23 • C. When dealing with semantic trajectories, we need to decide how to represent context information. For example, Noel et al. [14] represented trajectories in a multidimensional manner, in which each dimension focused on a single aspect, and each aspect was represented by a trajectory. Table 1 presents a representation of the trajectory in Figure 1. In Table 1, the transition event is the displacement between the stopping points. The first line is the POI name trajectory, which begins at Cappella dal Pozzo and ends at Teatro Sant'Andrea. We then have the category trajectory following the same direction, starting at a chapel and ending at a theater. Finally, we obtain the means of the transport and temperature trajectories. Hence, it is possible to analyze trajectories from different viewpoints and solve queries concerning certain aspects. For example, it is possible to search for trajectories in which a person travels using only a bus as the transport mean or trajectories that start at a mall and end up at a theater. Current search engines on semantic trajectory datasets retrieve only the set of trajectories that exactly matches each constraint defined in the user query. For example, suppose someone is looking for the trajectories of a person arriving at a given church by bus. In this case, the query result will only contain trajectories with the category attribute equal to church, and the transport mean attribute equal to bus. Occasionally, it becomes challenging to find trajectories that perfectly match all query constraints. The more constraints a query has, the more difficult it becomes to find compatible trajectories. For example, a query that looks for people trajectories who went by taxi to a chapel, then by bus to a museum, and walked to a theater would not return the trajectory shown in Figure 1. Among all the constraints defined in the query, only the transport mean aspect of the first and last locations are not satisfied by that trajectory. Hence, even when satisfying almost all query restrictions, the trajectory depicted in Figure 1 cannot be retrieved as a result.
A query on a semantic trajectory database expresses the disposition of stop points along the trajectory [15]. Examples include searching for trajectories that start at the Leaning Tower of Pisa, trajectories that end at a museum, or trajectories that visit a church and then a theater.
Aiming to solve the aforementioned limitations, this study proposes a semantic trajectory framework that represents multi-aspect trajectories and can search for the most similar trajectories according to the aspect values contained in the query through a ranking approach. The framework queries a semantic trajectory database using text processing techniques. Hence, the trajectory is represented as a string vector. Each string may represent the POI's name, category, or other aspects. A query is also represented by the vectors. The distance between the query and trajectory vectors determines the matching and ranking of the result set. As a baseline, we used the semantic trajectory search framework developed by Izquierdo et al. [15], which describes a formal framework for semantic trajectories using description logic (DL) and SPARQL.
Thus, the main contributions of this article are as follows: • The proposition of a new approach to represent trajectory data based on text. • The development of a search engine for querying semantic trajectories taking into account not only the POIs name and categories, but also the semantic trajectory aspects. • The specification of a new ranking algorithm that enables searching for trajectory similarity. • The implementation of a simple and efficient approach-execution time and storage requirements-to perform queries on semantic trajectories, when compared to the SPARQL-based approach.
To validate our approach, we implemented a case study using TripBuilder [16], a trajectory dataset built from Flickr data, combined with Wikipedia data.
The remainder of the article is structured as follows. Section 2 discusses related work. Section 3 presents the fundamental concepts and the formal definition of the semantic trajectory query framework. Section 4 presents a running example to instantiate the SETHE framework. Section 5 describes the experiments performed. Finally, Section 6 concludes the paper and discusses further work to be undertaken.

Related Work
Usually, raw trajectory data are captured and stored in a spatial database known as a moving object database (MOD) [17]. However, once these data are captured and analyzed, it is necessary to enrich them with context-based information to increase analytics processing and to enable users to perform tasks such as identifying traffic jams, finding people's behavior patterns, observing navigation routes, identifying fishing areas, studying animal migration, understanding hurricane trajectories, and so on [5]. SeMiTri [18] is a trajectory enrichment system that uses semantic annotation to identify trajectory stops and moves [19]. SeMiTri semantically describes trajectories with information about the POI, means of transport, and type of geographic region (residential, business, market, etc.).
CONSTAnT is a conceptual data model that represents the main aspects of a semantic trajectory [20]. The model is divided into two parts. The first part describes simple entities, providing information about the mobile object, trajectory, sub-trajectory, semantic points, environment, places, and events. The second part describes complex entities in which data mining techniques are utilized to identify information such as the purpose of movement, means of transport used, and behavior of the moving object.
Nöel et al. [14] proposed a semantic trajectory model composed of multiple aspects, where each aspect has a group of related attributes. The authors argue that a semantic trajectory can be analyzed from different points of view, such as residential and professional. The city name where the user stayed, the type of place (house or apartment), and rent value are attributes when looking at the trajectory from the residential point of view. Work, occupation, and salary are semantic attributes when examining trajectories from a professional perspective.
RDF graphs and ontologies have also emerged as solutions to enrich semantic trajectories [13,21]. The representation of semantic trajectory data in RDF enables the inference of new knowledge and the publication of data as linked open data (LoD). The CRISIS system is an example of an application that deals with trajectory data streams and uses an RDF graph that semantically represents the marine data received from several sensors [22]. Baquara 2 is another example of a conceptual framework that analyzes and semantically enriches trajectories by using a customizable process [7]. The MASTER project models the trajectory and its context using RDF and uses the rendezvous database to store the RDF data [13].
Alvares et al. [23] proposed a model that represents essential parts of a trajectory (stop, move, and semantics) and uses the SQL language to perform queries on places visited, types of places, and other analyses. Izquierdo et al. [15] addressed the problem of queries on semantic trajectories using a stop-and-move representation. The authors described a formal framework using DL to formally introduce the syntax and semantics of trajectories and the mechanisms needed to express queries in their database. As a proof of concept, the authors used the TripBuilder [16] dataset with georeferenced photos captured by Flickr users. The stop points were enriched with the POI name, category, and movements with the means of transport. The concepts described in DL were implemented in RDF, and the queries were expressed using SPARQL.
The aforementioned studies used complex data models to represent the semantic trajectories. These models contain many entities and relationships that make the entire scenario challenging to understand. These become more complex as new aspects are added to the data model. Consequently, queries become more difficult to express and are based on exact matching without ranking. This study proposes a text representation of semantic trajectories, resulting in a simpler data model that optimizes memory size requirements and query performance. Consequently, a similarity query returns more results, reducing the frustration of the user due to empty answers.

SETHE: A Semantic Trajectory Retrieval Approach
In this section we present the SETHE (SEmantic Trajectory HuntEr) framework, a new approach for representing semantic trajectories and their aspects using text processing. In SETHE, POI names, categories, and other aspects are represented as a sequence of terms. We then perform queries using text processing and rank the result set according to how close the trajectory result set was to the query. SETHE searches for trajectories containing at least one sub-sequence corresponding to the query, and whose semantic values are closest to the aspects specified in a user query. In this section, we formalize the SETHE underpinning trajectory model.

Basic Concepts
There are several similar definitions of trajectories [18,20,23]. However, we present a new approach for representing and dealing with trajectory data, in which a semantic trajectory is represented as a text vector. Hence, a query may be expressed using regular expressions, and a ranking approach is used to return not only exact matches, but also similar results. Considering that a POI is a specific location in which someone may be interested [24], we present a new aspect-based semantic trajectory definition.

Definition 1.
An aspect-based semantic trajectory is a sequence of POIs T = p 1 , p 2 , . . . , p n ordered chronologically and represented by a set of tuples ST = { name, st 1 , category, st 2 , aspectName 3 , st 3 , . . . , aspectName a , st a }, where a ≥ 2. Every st i is a sequence of n text values, one for each point p i ∈ T. A POI of ST can be simple with only name and category, so ST = { name, st 1 , category, st 2 }, or more complex, with semantic aspects other than the POI name and category. Hence, names and categories are mandatory, and other aspects are optional.
Applying Definition 1, we can represent the Figure 1 trajectory as: A sub-sequence is another important concept for understanding semantic trajectory query processing. According to Gusfield [25], while a sub-trajectory represents consecutive points of a trajectory T, the POIs do not need to be consecutive in a sub-sequence of T. Sub-sequences are used during query processing to retrieve POI sequences from T that match the user's query.
According to Definition 2, the sub-sequence SST f 1 represents a sub-sequence of ST f 1 . We can see that the SST f 1 POIs are sequential points of ST f 1 , but they are not necessarily consecutive.

Query Processing
To perform a search in a semantic trajectory dataset, users must provide some information, such as POIs names, categories, and/or other aspects. The stops can be identified based on the POI name or category. Figure 2 shows a graphical example of a query in which a user is searching for trajectories that pass through a museum and end up at the Leaning Tower of Pisa. In addition to specifying the stops, the user may also specify what aspects are associated with each specific point. In Figure 2, the user wants to search for trajectories of people who use a taxi to go to a museum with a rating score of four, and rainy weather. Finally, the person traveled by bus to the Leaning Tower of Pisa, given a rating score of five, and the weather was clear.
The use of exact matching for the query depicted in Figure 2 may result in few or no results. To solve this limitation, SETHE searches for trajectories that most closely match the query using a ranking algorithm. For this, the user must also provide a distance function and weight for each aspect type. The distance function calculates the distance of the query aspect from a given trajectory. In addition to the distance function, the weight represents the degree of importance of each aspect of the user query. The distance and weight influence the final result rank. Table 2 lists some examples of these functions, where random values are compared to the aspect values shown in Figure 2. In this example, we used a word2vec function for the means of transport, the equal function for the weather aspect, and the Euclidean function for the rating aspect. The word2vec function calculates the semantic distance between the terms. The equal function returns only one of these two values: 0 (zero) when the terms are different and 1 (one) when the terms are equal. The Euclidean function is computed as Euclidean(a, b) = |(a − b)|.  The following subsections detail the SETHE querying process. This process is accomplished in several steps: building a query to be interpreted by the framework, building a vector representation for the query, retrieving the sub-sequence with the same stop points specified in the query, building a vector representation for each retrieved sub-sequence and its aspects, and calculating the similarity between the query and sub-sequence vectors.

Query Building
A query is a sequence of expressions that may contain a POIs name, categories, and other aspects that indicate the semantic trajectory in which the user is interested. For example, using the categories sequence (museum; tower) and the sequence of transport mean aspects (Bus; Taxi), SETHE looks for routes that use a bus to arrive at a museum and a taxi to arrive at a tower.
During the searching process, SETHE considers either the name or category of the POIs at any position in the trajectory. However, it is possible to use features of regular expressions to inform the position of the POI in the textual path. For example, when using the symbol^, we indicate that the POI must be at the beginning of a trajectory, and with the symbol $, we say that the POI should be at the end of a trajectory. Therefore, when performing a query using the sequence (^museum; tower$), SETHE looks for trajectories that start at a museum and end at a tower. Table 3 shows five regular expression symbols used in the query process and two new symbols ((?-) and ∼) that help building query expressions. Inspired by [12], we define the query as follows. To facilitate query comparison we use only the distance function in the optional aspects. Following Definition 3, we can use the tuple Figure 2. This query specifies two points. The first is a category (museum), and the second is a POI name (Torre Pendente di Pisa). Therefore, sequence E must contain two tuples (name and category), and sequence e i must have two points, where The regular expression . * is used when POI names and categories are unknown. The optional aspects in Figure 2 are means of transport, weather, and rating. The weight of each aspect depended on the user's choice. In this example, we set the highest priority to the means of transport and the weather aspect as the lowest priority. Therefore, we assigned the following weight sequence: W f 3 = {0.5, 0.2, 0.3}. To calculate the distance between two aspect values, we use the word2vec function for the means of transport, the equals function for the weather aspect, and the Euclidean function for the rating aspect; therefore, D f 3 = {word2vec, equals, Euclidean}. We adopted the values L f 3 = {1, 1, 5} for the threshold sequence. The value of 1 (one) is the highest possible value for the word2vec function. The maximum equals function value is 1, and 5 is the maximum value for the Euclidean function of the rating aspect, which varies between 1 and 5. Therefore, query Q f 3 is expressed as SETHE transforms a query into text to compare it with the textual trajectory database. This process is divided into four main steps.

1.
Using the regex function to obtain the trajectories that pass through the POIs with the names and categories of the expressions.

2.
Extracting the sub-sequences of T trajectory, in which both the name and category of the POIs match E regular expressions. 3.
Using distance functions and aspect weights to calculate the query coefficient similarity with the sub-sequences.

4.
Ranking the result according to the coefficient in descending order.
The function regex(text, pattern) was used to determine the trajectories. The text parameter can take one of two sentences: either POI name sequence st 1 or POI category sequence st 2 . The pattern parameter is a regular expression composed of the concatenation of e i elements. The regular expression (. * ) was used to merge the e i expressions. The regex function informs if the st i has at last one sub-sequence that matches the pattern. Using the Q f 3 example, the pattern value is (. * )(. * )(Torre Pendente di Pisa) for text equal to st 1 and (museum)(. * )(. * ) for text equal to st 2 .
Finally, SETHE uses two regex functions over the database to look for the semantic trajectories that passed through the museum and the Leaning Tower o f Pisa. The final expression is as follows: regex(st 1 , "(. * )(. * )(Torre Pendente di Pisa)") and regex(st 2 , "(museum)(. * )(. * )") Before proceeding, it is necessary to transform a query into a vector representation. A query Q is represented by sentence δ q and vector − → v q . The sentence δ q = (y 1 1 , y 2 1 , . . . , y b 1 , y 1 2 , y 2 2 , . . . , y b 2 , y 1 m , y 2 m , . . . , y b m ) is the interleaved concatenation for each POI of the regular expressions α i of the optional aspects. The coordinates of the vector − → v q = (v 1 , v 2 , v 3 , . . . , v z ) are the interleaved weights associated with each POI aspect, where Equation (1) identifies the aspect weight in W.

Discovering Sub-Sequences
After retrieving the semantic trajectories using the regex query, SETHE calculates a vector representation for each ST sub-sequence that matches all regular expressions defined in e 1 and e 2 . We describe this process with four algorithms. Algorithm 1 (the main function) is responsible for invoking the functions described in the other algorithms. Algorithm 1 demonstrates how to extract sub-sequences from an aspect-based semantic trajectory. The calcSubsequences function receives parameters st 1 , st 2 , e 1 , and e 2 . We used a tree as the data structure that will help to determine the trajectory sub-sequences. The tree starts with an empty child, which will be the tree root, and its children will be the start sub-sequence POI. According to Definition 3, regular expressions e 1 and e 2 must have the same size. Algorithm 1 initially verifies if e 1 is equal to ".*", then it takes e 2 . The regex function looks for all matches in the text for each e i regular expression. Each item in the matches variable has a POI text (name or category) and POI position at the trajectory. A node is created for each match and added to the tree. The new node receives the text value of the match, the text position in the st i , and the index of the regular expression in e i . If it is the first regular expression, the node is added as a child of the root of the tree. Otherwise, the recursive function addNewNode will look for the correct position of the node in the tree. The f ixTree function removes all tree branches whose heights are less than the size of e i . Thus, only the ST sub-sequences that match e 1 and e 2 remain in the tree. The extractSubsequence function is responsible for traversing the tree nodes to extract the trajectory sub-sequences and store them in the listSubs variable. In the final of the calcSubsequences function, the listSubs variable has all the sub-sequences of ST. matches ← regex (text, exp) 13: for all ma in matches do 14: node ← new node 15: node.text ← ma.text 16: node.textPosition ← ma.position 17: node.expIndex ← index 18: if index == 1 then 19: root.addChildren (node) 20: else 21: for all nodeChild in root.children do 22: addNewNode (nodeChild, node) 23 return listSubs 32: end function Algorithm 2 describes the addNewNode recursive function. This function receives two parameters: the father and the child nodes created by Algorithm 1. To add the new node as a child of the father node, there are two constraints: first, the node position must be greater than the father node position; second, the index of the new node must be one unit above the index of the parent node.  7: for all childNode in f ather.children do 8: addNewNode (childNode, newNode) 9: end for 10: end if 11: end function Algorithm 3 describes the recursive function f ixTree. This function receives three parameters: the tree, the node to be checked, and the leaf node height. Each node of the tree is visited recursively until reaching the leaf nodes. If the index of the leaf node is different from the height (e i size), the node is removed from the tree. This process is repeated until the end to remove all the nodes with no child and index less then height.

Algorithm 3
Remove Incomplete Sub-sequence from the Tree 1: function FIXTREE (tree, node, height) 2: for all childNode in node.children do 3: f ixTree (tree, childNode, height) 4: end for 5: if node.children is empty then 6: if node.expIndex ! = height then 7: tree.removeNode (node) 8: end if 9: end if 10: end function Algorithm 4 describes the function extractSubsequence behavior. This recursive function receives three parameters: the tree node, the sub-sequence currently being processed, and the sub-sequence list, which is the variable that stores the final result. The algorithm iterates through all child nodes and adds the value to the subsequence variable. If a node has more than one child, it means that more sub-sequences contain that node. Therefore, the subsequence is cloned, and the extractSubsequence function is invoked again with the following parameters: child node, clone, and list of sub-sequences listSub. When the node parameter is empty, there are no more children to be processed; hence, the subsequence value will be an ST sub-sequence. Then, the value of the subsequence variable is added to listSub. At the end of the function, variable listSub will have a list of all ST sub-sequences that satisfy both e 1 and e 2 .

Transforming a Sub-Sequence into a Vector
After identifying SST 1 and SST 2 sub-sequences, SETHE creates a sentence δ for each one of them, similar to what was performed for query Q f 3 . A new sentence δ sst and a vector − → v sst are created for each sub-sequence, where δ sst = (r 1 1 , r 2 1 , . . . , r b 1 , r 1 2 , r 2 2 , . . . , r b 2 , r 1 n , r 2 n , . . . , r b n ) and , such that r corresponds to the SST optional aspects and ν The score is calculated for each term of δ sst , and its value is vector − → v sst . The score function uses a distance function to calculate the closeness of a term δ sst to the same index term of δ q .
The smaller the distance, the higher the score for the term of δ sst . If there are two terms, one belonging to query Q and the other belonging to a sub-sequence of ST, such that y j i δ q and r j i δsst, the equation to calculate the score between the two terms is where w j ∈ W, d j ∈ D and thr j ∈ L The trajectory coefficient of ST consists of the highest similarity value between the query vector of Q and the vectors of ST. The similarity function must return a value between zero and one. Examples of such similarity functions include Jaccard and cosine functions. Let Vts be the set of all vectors created by the ST sub-sequences. The coefficient is calculated as where, 0 similarity(., .) 1 A composite query CQ = (Q 1 , Q 2 , . . . , Q n ) consists of multiple queries gathered into a single query, in which SETHE executes each query separately and merges the results of each query into a single result set.

Running Example
This section presents an example of how SETHE works using a query and a database composed of three trajectories. Let us suppose a Q query that looks for trajectories in which a mobile object initially goes to the Leaning Tower of Pisa, then visits a chapel or a church, and later on, stops at a museum.
In this example, we can see that the transport mean has a higher weight than rating. Transforming the query into sentence δ q and applying Equation (1) For simplicity, we highlighted only the trajectory category and aspects, shown below. In addition, for ease of explanation, we placed an index on each term in the trajectory: The second step is to identify the sub-sequences that satisfy the e 1 and e 2 expressions. Following Algorithm 1, a tree of paths is constructed, as depicted in Figure 3. Each level of the tree, except the root, represents an expression in either e 1 or e 2 . Each POI of ST that satisfies regex(st 1 , e 1 ) or regex(st 2 , e 2 ) is added to the tree as a child of lower index expressions. For example, using the function regex( f 4 , "(church|chapel)"), the category f 4 matches the regular expression (church|chapel), so the algorithm adds f 4 as an f 1 and f 3 child in the tree. Category h 4 , for example, is added only as a child of h 3 and not of h 5 , as h 4 occurs before h 5 in the trajectory. The sub-sequences extracted from the POIs tree are: After identifying the trajectory sub-sequences, the next step is to calculate the score of each optional aspect to compose the vector that will serve as a similarity comparison with the v q vector.
Tables 4-7 present the sentences of each ST F sub-sequence that were created by interleaving the optional aspects. Each term corresponds to a sub-sequence optional aspect, starting with the transport mean and rating aspects related to each POI sub-sequence. As specified in the query Q, the equals function is used to calculate the similarity between two aspects of the transport mean type. Therefore, there are only two possible values: 1 (one), if the values are the same, and 0 (zero), if they are different. For the rating aspect, we use the Euclidean distance function. Applying Equation (2) to each word, we find the score for each aspect. For example, the rating score of value 4 (four) is calculated as follows:   Tables 8 and 9 present the score for the ST G , and Tables 10-12 contain the scores for ST H sub-sequences.  Table 9. Scores for the sub-sequence σ Gb . The coordinates of each trajectory vector contain the score calculated for each subsequence. Thus, we have the following vectors for the ST F sub-sequences: In this example, we use the Jaccard index to calculate the similarity between the vectors of the trajectories and the query vector v q . Equation (4) calculates the Jaccard index: Applying Equation (4) for ST F and v q vectors, we have the following values: Applying Equation (3), the coefficient for the ST F trajectory is 0.493. After performing the same process for the ST G and ST H trajectories, we have the final result: Therefore, the ST H trajectory has the highest coefficient, which is closest to the user query. Second is the ST F path, and ST G is the path with the least similarity to the Q query.

Experiments and Results
We used the TripBuilder dataset [16] to evaluate the performance of our solution. We evaluated the performance of SETHE based on the framework described in [15], which presents a set of 10 queries for the city of Pisa, Italy.

Dataset
The TripBuilder RDF dataset contains 1,617,582 triples and 55,474 trajectories, modeled into Trajectory, Stop, Move, Transportation, and POI classes. Figure 4 shows the UML representation of TripBuilder. A trajectory can have several stop and moves, as represented by the * symbol in the relationship. Each trajectory has start and end points, and each point represents a POI. The Move class represents the transition between two stops, and is semantically enriched by the Transportation class. To conduct the experiments with SETHE, we transformed the RDF TripBuilder dataset into a text dataset. We modeled a database to store the textual trajectories, as depicted in Figure 5. The Trajectory entity has a one-to-many relationship to each entity representing a different trajectory type. The POI entity represents the textual trajectory where each point contains the name of the place where the moving object stopped. The Category entity contains the category textual trajectories. The Move entity contains the textual trajectories of the transport mean utilized to reach each stop. The LocatedIn entity contains the trajectories of the regions where the moving object has stopped.  Table 13 presents a sample of data of the POI name trajectory stored in the Value column. Table 14 shows category trajectories. Table 15 presents some examples of transport mean trajectories. The first move does not have an associated transport mean; therefore, it receives the value N/A. Indeed, SETHE enables trajectories with missing information for a given aspect. When this happens, i.e., a given POI without aspect, we use the special value N/A. For example, consider a trajectory in which the first and third POIs do not have a particular aspect, say transportation. Then the transportation aspect for that trajectory would be represented in the following way: N/A, Subway, N/A, Taxi, Bus, Subway .

Results and Discussion
The experiments were carried out on a computer with a Core i7-7700 3.60 GHz processor, 32 GB of RAM, and 500 GB HD, with a GNU/Linux Ubuntu 18.04 operating system. We installed the RDF dataset in the tuple database (TDB) of the Apache Jena Fuseki 4.3.2 server running on the Java platform jdk-16.0.2. We used PostgreSQL 13.2 to store the textual trajectory database. We converted the RDF triples to CSV spreadsheets, removed accents and special characters, and then loaded the data into the text database. Izquierdo et al. [15] described a semantic trajectory search framework and specified ten queries for city of Pisa to evaluate their framework performance. The same queries were used to evaluate the SETHE framework. The queries are listed in Table 16. Table 17 shows how to use the SETHE framework to answer the aforementioned queries. Some queries require POIs visited consecutively. Thus, the proximity between the stopping points was also used as a trajectory aspect. In this case, proximity refers to the number of stops between two POIs. In SETHE, when a proximity attribute is set to a tilde symbol (∼), the closer two POIs are, the higher their score. Izquierdo et al. [15] use the equals operator to compare means of transport; therefore, we used the distance function equals for the transport mean aspect. In queries that use both aspects (e.g., transport mean and proximity), we adopted the exact weight of 0.5 in these examples.

Q1
Trajectories that stop at a museum and then at a chapel.

Q2
Trajectories that stop at a tower, then stop at a chapel or church, then stop at a chapel or church again, and then at a museum.

Q3
Trajectories that stop at least once in a tower, and then at a museum.

Q4
Trajectories that stop at the Lion Tower and then at the Leaning Tower, or stop at the Leaning Tower and then at the Lion Tower.

Q5
Trajectories that begin at a museum and then end at a chapel.

Q6
Trajectories that stop at a museum and, later on, end at a chapel or a church optionally.

Q7
Trajectories that begin at a chapel, stop at zero or more chapels, and end at a chapel.

Q8
Trajectories that stop at a museum and then take a bus to a chapel.

Q9
Trajectories that begin at a chapel or a church, always move by bus between stops, and end at the Leaning Tower.

Q10
Trajectories that begin at a tower, then walk to take a bus to a church, and then, using any transportation means, end at a palace.   Table 17, when the query does not have a value of e 1 , it is assumed that this value is empty. All queries are simple, except for query Q4, which is a composite query.
We compared the performance of the SETHE PostgreSQL queries to that of the SPARQL queries. Regular expressions were extended with ∼ and (?−) operators. Each query was executed ten times for both SPARQL and SETHE. Figure 6 shows the average execution time for each query on a logarithmic scale. We observed that SETHE has a better response time in most cases than SPARQL queries [15]. The Q10 SPARQL query was not charted because it took approximately one hour to run. Owing to the ranking algorithm, which implies better recall, as shown in Figure 7. The first SETHE query results are the same as the SPARQL query; they fit perfectly with the user query. The blue bars in the graph in Figure 7 represent the result set returned by SETHE that was not retrieved by SPARQL. The trajectories in the blue area are similar to the user query specifications but do not fit perfectly with the SPARQL queries. Another important issue to be analyzed is the storage space of each investigated approach. The Apache Jena server uses TDB to store the RDF graph. Figure 8 shows the storage space between TDB and PostgreSQL. It was observed that the TDB demanded more than five times the memory size demanded by our textual approach.

Conclusions
The insertion of context-based information into trajectory data results in semantically enriched trajectories. Thus, trajectories may be analyzed from different perspectives, also known as aspects. Each perspective enables spatiotemporal context-based information analytics. In this study, these trajectories were called aspect-based semantic trajectories. Depending on the application, the trajectory aspects may vary significantly in terms of quantity and type. Some related approaches represent semantic trajectories using RDF graphs, ontologies, or conceptual models in which the search process is based only on an exact match. Depending on the complexity of the query, exact matches may yield few or no results.
This article proposes the SETHE framework, a search engine for querying aspect-based semantic trajectory datasets using text processing. The SETHE implements partial matching using a similarity coefficient between the aspect-based semantic trajectory and the user's query to rank the result set. In traditional semantic trajectory search tools, there is no weight related to a given aspect; hence, all aspects have the same priority as the user. Our approach uses a distance function and a weight assigned to each aspect that impacts the ranking algorithm. The result set contains trajectories ranked by their coefficients calculated from the distance functions and weights. Using a ranking approach, the trajectories closest to the user query may be returned.
We also present a new approach to representing aspect-based semantic trajectory data, where each trajectory is represented only by text. The experiments using this approach demonstrated that the memory consumption for storing trajectories and their aspects is lower than that of an approach using an RDF graph, one of the main semantic trajectory representations used.
To assess the relevance of our work, we compared the results with those of one of the most recent studies in the field of semantic trajectory search. The results demonstrated that the SETHE had a better average response time. Furthermore, a SETHE query usually returns more results as we use a partial-match ranked approach. In future work, we intend to use the normalized discounted cumulative gain (NDCG) to measure ranking quality. We plan to extend our SETHE framework to encompass multidimensional modeling so that users can run rollup and drill down operators over trajectory aspects. Finally, we will work on implementing a graphical user interface (GUI) and perform a user assessment using the ISO 9241 standard-parts 14, 16, and 17.