Towards an Extensible and Text-Oriented Analytical Semantic Trajectory Framework

Almeida, Damião Ribeiro de; Baptista, Cláudio de Souza; Andrade, Fabio Gomes de; Paiva, Anselmo Cardoso de

doi:10.3390/ijgi14080292

Open AccessArticle

Towards an Extensible and Text-Oriented Analytical Semantic Trajectory Framework

by

Damião Ribeiro de Almeida

¹,

Cláudio de Souza Baptista

^2,*

,

Fabio Gomes de Andrade

³ and

Anselmo Cardoso de Paiva

⁴

¹

Federal Institute of Paraíba, Monteiro 58500000, Paraíba, Brazil

²

Department of Computer Science, Federal University of Campina Grande, Campina Grande 58429900, Paraíba, Brazil

³

Federal Institute of Paraíba, Cajazeiras 58900000, Paraíba, Brazil

⁴

Applied Computing Center, Federal University of Maranhão, São Luís 65080805, Maranhão, Brazil

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2025, 14(8), 292; https://doi.org/10.3390/ijgi14080292

Submission received: 16 May 2025 / Revised: 20 July 2025 / Accepted: 23 July 2025 / Published: 28 July 2025

Download

Browse Figures

Versions Notes

Abstract

Semantically enriched trajectories have attracted growing interest in recent research, driven by the need for more expressive and context-aware movement data analysis. Two primary approaches have emerged for the storage and management of such data: moving object databases, which operate at the transactional or operational level, and trajectory data warehouses (TDWs), which support analytical processing within decision support systems. Conventional TDW methodologies typically model semantic aspects of trajectories by introducing new dimensions into the data warehouse schema. However, this approach often requires structural modifications to the schema in order to accommodate additional semantic attributes, potentially resulting in significant disruptions to the architecture and maintenance of the underlying decision support systems. To overcome this limitation, we propose a novel TDW model that supports dynamic and extensible integration of semantic aspects, without necessitating changes to the schema. This design enhances flexibility and promotes seamless adaptability to domain-specific requirements. To enable such extensibility, we propose an innovative approach to representing semantic trajectories by leveraging natural language processing (NLP) techniques. without relying on traditional spatiotemporal features. This enables the analysis of semantic movement patterns purely through textual context. Finally, we present a comprehensive framework that implements the proposed model in real-world application scenarios, demonstrating its practical extensibility.

Keywords:

semantic trajectories; trajectory analytics; data warehouse

1. Introduction

Recent advancements in mobile technologies—including smartphones, wireless communication systems, and low-cost sensor networks—have significantly enhanced the ability to monitor various types of moving objects within geographic space. Moving objects encompass a broad range of entities, such as individuals, animals, vehicles (e.g., cars, buses), maritime vessels, and even meteorological phenomena like hurricanes.

Currently, Global Positioning System (GPS) remains the predominant technology for capturing the trajectories of moving objects [1]. A raw trajectory is typically represented as an ordered sequence of spatiotemporal points, each consisting of a timestamp and spatial coordinates such as latitude, longitude, and optionally, altitude [2].

Trajectory data plays a critical role in understanding the dynamics and behavior of moving objects. Through trajectory analysis, it is possible to extract valuable insights across a range of application domains—for example, identifying traffic congestion zones, analyzing human mobility patterns, determining optimal routing strategies, locating productive fishing areas, studying animal migration paths, and tracking the evolution of weather systems such as hurricanes [3].

While raw trajectories typically consist only of the object’s identifier, spatial coordinates, and timestamp at each recorded location, such data alone may be insufficient for complex analytical tasks. To support more advanced and semantically meaningful analysis, many contemporary applications enrich raw trajectory data with semantic information, capturing contextual attributes that describe the environment in which the movement occurred.

As defined by Emmanouilidis et al., a semantic aspect refers to any contextual information that may influence the delivery or quality of a service [4]. Such information can be derived from various sources, including environmental conditions, user profiles, or external systems. When trajectory data is enriched with semantic aspects, the resulting dataset is referred to as a semantic trajectory [5,6].

Semantic enrichment enables a deeper understanding of the behavior of moving objects by providing contextual details that are not captured by raw spatiotemporal data alone [7]. For instance, while a raw trajectory may describe the spatial and temporal sequence of visited locations, a semantic trajectory can reveal additional attributes such as points of interest (POIs) visited, transportation modes employed, or even the purpose of the movement. These enriched datasets facilitate high-level reasoning and advanced analytical tasks that extend beyond the simple mapping of positions between an origin and a destination.

A semantic trajectory is typically composed of a sequence of spatiotemporal points, each annotated with relevant semantic metadata. For example, Figure 1 depicts the semantic trajectory of an individual throughout a typical workday. In addition to location and time, the trajectory was annotated with physiological and behavioral data. The individual departed from her residence at 9:00 a.m. with a recorded heart rate of 75 bpm. She boarded a bus and arrived at her workplace approximately one hour later, during which her heart rate increased to 80 bpm. Her workday lasted from 10:00 a.m. to 5:00 p.m. Subsequently, she took the metro to a supermarket, at which point her heart rate measured 73 bpm. The day concluded with a walk home, during which her heart rate decreased to 70 bpm.

This example demonstrated how semantically enriched trajectories offered multi-faceted insights into mobility patterns, enabling researchers and decision-support systems to interpret user behavior within a richer and more meaningful context.

A semantic element associated with a point in a trajectory is referred to as aspect [8]. Aspects can describe contextual information related either to the environment or to the moving object itself. Representative examples of aspects include the name and category of a point of interest (POI), weather conditions, mode of transportation, and physiological indicators such as the user’s heart rate [9]. Once aspect information is annotated onto a trajectory, the result is a complex, multi-dimensional object in which each point is enriched with semantically relevant attributes, providing a more comprehensive representation of the movement context. Efficient storage and analysis of such semantically enriched trajectory data is essential to supporting advanced query processing and to extract actionable insights. While traditional spatial database systems are capable of processing spatiotemporal queries—such as “What is the average speed of individuals who departed from home and used public transport to visit a church?” or “What was the total distance traveled by individuals in New York City in 2012 who stopped at least once in Central Park on a rainy day?”—the complexity and computational cost associated with these queries may render conventional systems inadequate for handling large-scale or high-dimensional datasets.

To address these challenges, trajectory data can be managed using either moving object databases (MODs) or data warehouses (DWs). In particular, trajectory data warehouses (TDWs) have gained prominence for their ability to support complex, analytical queries with improved performance. This is achieved through a multidimensional modeling approach, in which data from various operational sources undergoes an extract–transform–load (ETL) process. During ETL, data is semantically transformed and temporally organized into fact tables and dimension tables, adhering to either star or snowflake schema [10].

In this model, facts represent numerical measures or key performance indicators relevant to analytical tasks (e.g., travel time, distance, number of stops), while dimensions provide descriptive attributes used to contextualize and aggregate the facts (e.g., location, time, aspect type). These dimensions are commonly organized into hierarchies, enabling data abstraction at multiple levels of granularity.

Analytical queries over TDWs are typically executed using online analytical processing (OLAP) tools. OLAP enables multidimensional exploration through operations such as drill-down and roll-up, which navigate the dimension hierarchies. The drill-down operation transitions from aggregated data at a higher hierarchical level to more detailed, fine-grained data. Conversely, the roll-up operation aggregates data by ascending to a more general level within the same hierarchy.

When tailored specifically to handle spatiotemporal and semantic aspects of trajectory data, such DWs are referred to as TDWs [11]. They provide a robust infrastructure for decision support systems that analyze mobility behavior in semantically rich contexts.

A significant limitation of traditional TDWs lies in their inability to effectively represent and process semantic trajectories. To overcome this shortcoming, the concept of semantic trajectory data warehouses (STrDWs) has emerged, offering support for both raw spatiotemporal data and associated semantic attributes [12,13].

Conventional STrDWs typically address semantic aspects by introducing a distinct dimension for each aspect type represented in the trajectory data. For example, Figure 2 illustrates the conceptual schema of a representative STrDW, composed of five dimensions—means of transportation, time, activity, space, and moving object—and two measures: distance and duration. In this design, semantic aspects such as transportation mode and activity type are modeled as individual dimensions. Although effective in fixed-schema scenarios, this approach presents a fundamental limitation: the schema becomes rigid and domain-specific, thereby limiting its adaptability to applications that require new semantic aspects.

As an illustration, the schema in Figure 2 cannot accommodate queries involving other relevant aspects such as weather conditions or physiological indicators (e.g., heart rate) without requiring modifications to the underlying DW schema. Furthermore, the continual introduction of new dimensions for emerging semantic attributes can significantly increase schema complexity, leading to higher maintenance overhead and reduced portability. To address these challenges, this article proposes a novel STrDW architecture, termed the aspect trajectory data warehouse (ATrDW), specifically designed to promote schema extensibility, semantic generality, and cross-domain adaptability.

Our Main Contributions

The proposed ATrDW introduces several key innovations that collectively address the limitations of existing STrDWs. The major contributions of this work are as follows:

A flexible and extensible data warehouse model for semantic trajectories: we present a novel TDW architecture that supports the integration of an unbounded variety of semantic aspects without requiring schema modification. Unlike traditional STrDW approaches, which rigidly bind each semantic aspect to a dedicated dimension, our model decouples semantic attributes from the DW schema structure. This design enables users to enrich trajectory data with new types of contextual information—such as weather, physiological data, social context, or events—without altering the core schema. This substantially enhances the flexibility, maintainability, and reusability of the warehouse across multiple application domains, and significantly reduces the structural coupling that commonly limits conventional TDW architectures.
A unified text-based representation for semantic aspects enabling NLP-driven analytics: in a departure from classical dimension-based modeling, we propose a text-oriented abstraction layer for representing semantic aspects of trajectory data. All aspects—regardless of source, structure, or semantic category—are encoded as free-form text. This design enables the execution of semantic queries over enriched trajectory data. By incorporating natural language processing (NLP) techniques—such as the use of regular expressions, pattern matching, and textual similarity metrics—the model supports semantic-aware querying and analytics that transcend the capabilities of traditional OLAP operations. This approach enables new analytical opportunities in trajectory-based decision support systems.
An integrated and extensible framework for heterogeneous trajectory data management: we implemented a complete framework that operationalizes the ATrDW model through a robust ETL pipeline, capable of ingesting and processing trajectory data from diverse sources. The framework supports the extraction of raw spatiotemporal data and semantic annotations, the semantic enrichment of trajectory points, and the seamless integration of this data into a single, unified ATrDW schema. It is designed to handle heterogeneous datasets with varying formats and semantic structures. This enables researchers and practitioners to deploy the ATrDW in real-world scenarios without requiring extensive customization or schema redesign. This framework was validated with two different data sets.

Collectively, these contributions represent a significant advancement in the field of semantic trajectory data management, offering a more adaptive, semantically rich, and analytically powerful solution for a wide range of domains, including urban analytics, mobility intelligence, health monitoring, and environmental tracking.

In software engineering, a framework is a reusable, semi-complete structure that provides generic functionality which can be selectively specialized through user written code to build specific applications [14]. To demonstrate the extensibility and applicability of the proposed framework, we conducted an experimental evaluation using two real-world datasets: the Foursquare dataset [9] and the TripBuilder dataset [15]. These datasets vary in both structure and semantic richness, enabling the validation of the framework’s capability to integrate heterogeneous semantic aspects without requiring modifications to the schema. Furthermore, we executed a set of analytical queries employing OLAP operations, including roll-up and drill-down, to assess the framework’s capability to support multidimensional navigation and semantic-aware analytical tasks.

The remainder of this article is organized as follows. Section 2 introduces the foundational concepts essential for understanding the problem domain addressed in this study. Section 3 reviews related work in the fields of trajectory data warehousing and semantic enrichment. Section 4 provides a comprehensive and detailed description of the proposed ATrDW model. Section 5 addresses the ETL process responsible for populating the ATrDW with raw and semantic trajectory data. Section 6 focuses on the experimental evaluation, showcasing the model’s extensibility and analytical capabilities using the aforementioned datasets. Finally, Section 7 concludes the paper and outlines directions for future research.

2. Basic Concepts

Spaccapietra and Parent define a semantic trajectory as a temporally ordered sequence of spatial positions, each annotated with contextual information [16]. Expanding on this concept, Chen et al. [17] and Zheng et al. [18] introduce the POI-based semantic trajectory, which models movement as a finite sequence of POIs, where each POI is a geo-textual object encapsulating both its geographical location and a textual description. Furthermore, Mello et al. characterize aspects as contextual attributes associated with a trajectory, which may reflect environmental conditions, user-specific data, or other domain-relevant metadata [8].

Building upon these foundational models, we introduce the notion of a multi-aspect POI-based semantic trajectory, which enriches POI-based trajectories with multiple semantic aspects. This structure is formally defined below.

Definition 1.

A multi-aspect POI-based semantic trajectory AT is a chronologically ordered sequence of POIs, represented as AT =

〈 P_{1}, P_{2}, \dots, P_{n} 〉

, where each point

P_{i}

is a tuple defined as

P_{i} = (MO, trajID, location, name, category, time, asp)

such that

MO = {id, type} identifies the moving object, including its unique identifier and type (e.g., person, vehicle, animal);
trajID is the unique trajectory identifier;
location denotes the geographical coordinates (latitude and longitude) of the POI;
name and category specify the POI’s descriptive metadata;
time indicates the timestamp at which the mobile object visited $P_{i}$ ;
$a s p = 〈 a_{1}, a_{2}, \dots, a_{k} 〉$ is the set of semantic aspects associated with the POI.

For conciseness, we henceforth refer to a multi-aspect POI-based semantic trajectory as simply a semantic trajectory.

A collection of semantic trajectories {

A T_{1}

,

A T_{2}

, …,

A T_{m}

} serves as the input for the ETL process, which integrates this enriched trajectory data into an STrDW. An STrDW extends the classical data warehouse (DW) architecture by incorporating spatial, non-spatial, and semantic dimensions.

Conceptually, a DW is modeled as a multidimensional data cube, comprised of fact and dimension tables. Each dimension may contain one or more hierarchies, which are composed of levels representing varying granularities of data.

Each level consists of a set of members, each characterized by a set of attributes. Fact records reference one member from each dimension and are associated with measures, which are quantitative variables that can be aggregated (e.g., distance, duration, count).

A dimension is classified as a spatial dimension if it includes at least one spatial hierarchy [11]. For instance, a spatial hierarchy can be defined as

H_{j}

= municipality < region < state, where the symbol “<” denotes a topological containment or generalization relationship between hierarchical levels.

In contrast, the semantic dimension stores the semantic enrichment data associated with trajectories. Multiple STrDW models have been proposed to represent diverse types of semantic dimensions. These include models that capture event-based semantics along trajectories [19], representations based on the 5W1H framework (Who, What, When, Where, Why, and How) [12,13], and models that encode specific contextual attributes such as weather conditions, transportation modes, and device types used during movement [3].

These multidimensional structures support powerful analytical operations, enabling rich and flexible querying of spatiotemporal and contextual mobility data for decision support, behavior analysis, and pattern discovery.

3. Related Work

One of the foremost challenges in contemporary trajectory analytics lies in the semantic enrichment of raw trajectory data [20]. Various methodologies have been proposed to address this issue by incorporating annotations into spatiotemporal data streams. For instance, VISTA facilitates visual analysis of vessel trajectories, allowing domain experts to manually annotate trajectory segments [21]. In contrast, ANALYTiC employs machine learning techniques to automatically generate semantic labels over trajectory datasets [22]. Similarly, Zheng et al. [23] infer transportation modes by analyzing mobility features such as velocity, acceleration, directionality, and stop rates. Yan et al. [24] extend this approach by utilizing a geographic map correspondence algorithm to deduce transportation means, categorize visited geographical regions such as residential and commercial areas, and identify points of interest types including home, office, and marketplace.

An influential model for semantic representation is the 5W1H framework, widely adopted in journalism and adapted for mobility analysis. This model characterizes trajectories through six dimensions: Who, representing the identifier of the moving object; Where, indicating the spatial location; When, specifying the temporal marker; What, describing the activity or object involved; Why, reflecting the purpose of the movement; and How, denoting the mode of transportation or movement pattern, such as either solitary or group-based behavior [25].

Research works such as Baquara [7] and CONSTAnT [26] have operationalized the 5W1H model to semantically represent mobility data. Baquara provides a framework for trajectory enrichment through ontological annotations, whereas CONSTAnT introduces a conceptual model that leverages classes, relationships, and semantic attributes (e.g., context, mobile object properties, and domain-specific events) to facilitate rich semantic interpretation and advanced mobility analysis.

MASTER [8] proposes a generic conceptual model that extends beyond 5W1H by representing multiple aspects of trajectories using RDF graphs. In this model, an aspect is any real-world fact relevant to interpreting movement behavior. Despite their expressiveness, these models operate within the paradigm of MODs and are not inherently designed for TDW.

Initial work on TDWs, such as that by Leonardi et al. [27,28], focused on spatial and temporal formalization of trajectory data. These models aimed to compute aggregate measures efficiently and leverage visualization for behavioral pattern analysis, without incorporating semantic annotations. Mob-Warehouse [13] constituted a significant advancement by proposing a conceptual model for semantic trajectory data warehouses (STrDWs) grounded in the 5W1H framework, in which the fact table incorporates two primary measures: the duration and the displacement distance between trajectory points.

Manaa and Akaichi [29] introduced a global ontology-based framework for the design, integration, and analysis of semantic trajectory data in a DW environment. Their approach supports the creation of a multidimensional ontology, encompassing dimensions, facts, and measures, to consolidate heterogeneous data sources. Similarly, Fileto et al. [2] extended Mob-Warehouse by incorporating semantic data harvested from Twitter, Foursquare, and LinkedGeoData.

Recent advancements further explore semantic trajectory modeling and querying. Wu et al. [30] propose a composite similarity metric integrating spatial, temporal, and semantic dimensions using WordNet for keyword-based queries. Sun et al. [31] introduce ST-LSTM, a deep learning model that predicts future locations by learning embeddings of semantic place attributes and user behavior. Almeida et al. [32] present SETHE, a text-based trajectory similarity framework leveraging vector space models for POIs, outperforming traditional SPARQL-based querying. Garani et al. [33] propose S-TrODW, a logical data warehouse model employing nested object-relational tables to represent trajectory segments. Luo et al. [34] utilize large language models (LLMs) to infer fine-grained semantics from raw GPS data, such as user occupation and movement narratives. Seep [35] applies extended finite state machines (EFSMs) for modeling semantic trajectory patterns, incorporating k-means clustering to discover semantically coherent trajectory groups. Pugliese et al. [36,37,38] propose tools such as MAT-Builder for trajectory enrichment and MAT-Sum for summarizing contextual information from raw mobility data to enhance trajectory representation. These tools support diverse use cases, including mobility behavior classification and routine pattern recognition. Hamann & Hagen [39] develop an unsupervised machine learning approach for inferring trip purposes from unlabeled GPS trajectories, enabling scalable mobility analysis without manual annotations.

Despite these developments, all existing STrDW implementations primarily rely on fixed semantic schemas. This rigid structure limits their applicability in scenarios that demand more granular or domain-specific semantic descriptors, particularly in the context of multi-aspect trajectory representations. Notably, these research efforts do not address the evolution of semantic aspect schemas.

To the best of our knowledge, the model proposed in this work is the first STrDW capable of accommodating an extensible set of semantic aspects without necessitating modifications to the underlying schema. By eliminating the need to redesign the schema with each new aspect type, our proposed STrDW substantially improves the maintainability, scalability, and generalizability of semantic trajectory warehousing systems, enabling smooth evolution of schemas, particularly with respect to semantic aspects. This architectural flexibility enables our model to accommodate heterogeneous semantic attributes across diverse application domains, thereby significantly advancing the state of the art in StrDW. Our semantic extensibility is achieved through a unified, text-based representation of semantic aspects, allowing the flexible integration of diverse contextual information. Rather than encoding each semantic dimension separately within the schema, all contextual annotations—regardless of type or source—are stored as free-form text. This design simplifies the model structure and enables the use of NLP techniques for advanced trajectory analytics, supporting expressive semantic queries beyond traditional OLAP.

4. The Aspect Trajectory Data Warehouse Model

This section introduces the proposed model, designated as ATrDW, which enables the representation of diverse semantic aspects without requiring the creation of additional dimension tables for each new aspect type.

In the ATrDW architecture, semantic extensibility is achieved by consolidating all aspect-related information into a single, unified dimension called the AspectDimension. This model includes at least four core dimensions: (i) a spatial dimension capturing the geolocation of each trajectory point, (ii) a temporal dimension representing timestamps, (iii) a moving object dimension identifying the mobile entity, and (iv) the AspectDimension, which encapsulates contextual or semantic data associated with each trajectory point.

Formally, the AspectDimension is defined as follows:

Definition 2.

Let

A D = {m_{1}, m_{2}, \dots, m_{k}}

be a set of aspect tuples, where each

m_{j} = 〈 Γ, V 〉

, with

Γ = 〈 λ_{1}, λ_{2}, \dots, λ_{n} 〉

representing the set of aspect types, and

V = 〈 ν_{1}, ν_{2}, \dots, ν_{n} 〉

the corresponding aspect values such that

| Γ | = | V |

and

\forall ν_{i} \in V

,

\exists λ_{i} \in Γ

.

The AspectDimension is designed to model contextual information associated with a mobile object’s behavior. For example, consider a semantic trajectory representing a tourist visiting multiple POIs within New York City. At the time of visiting Central Park, the associated context comprises weather conditions marked as “clear”, a temperature of 69.1 °F, a heart rate of 70 bpm, and a transportation mode identified as “subway”. These values are modeled as a single entry in the AspectDimension:

m 1 = 〈 〈 weather, temperature, heartbeat, transport 〉, 〈 clear, 69.1, 70, subway 〉 〉

As the tourist continues her journey, subsequent aspect entries are generated. For example, an AspectDimension instance

A D_{e x}

may include:

Γ

= 〈 weather, temperature, heartbeat, transport 〉

V_{1}

= 〈 clear, 69.1, 70, subway 〉

V_{2}

= 〈 rain, 67.4, 60, taxi 〉

V_{3}

= 〈 cloud, 68.5, 65, walk 〉

This concise and adaptable representation facilitates the seamless integration of heterogeneous semantic data without necessitating modifications to the underlying schema. Figure 3 illustrates the integration of the AspectDimension within a multidimensional data cube. The cube includes three dimensions: POI, time, and aspect, and a single measure: the number of visitors. Notably, only the values

V_{i}

of the AspectDimension tuples are used in the cube’s analytical operations.

This structure enables expressive and semantically rich queries. For instance, one may query: “How many visits were recorded at the Empire State Building during rainy days in November?” or “How many individuals visited Central Park when the temperature exceeded 68 °F and the weather conditions were clear?” Such queries are supported by applying pattern matching and textual filtering techniques over the AspectDimension values. These techniques are further detailed in Section 4.1.

4.1. Pattern Matching

Given the heterogeneous nature of values encapsulated within the AspectDimension, it becomes essential to isolate and extract each individual element

v_{i} \in V

. To achieve this, we leverage regular expressions (regex)—a foundational technique in NLP for identifying patterns within textual data [40]. For instance, to extract the first aspect type

λ_{1}

∈

Γ

and its corresponding

v_{1}

∈

V_{1}

, the following application of the substring function with regular expressions can be employed:

s u b s t r i n g (Γ f r o m' ([+ -] ? ∖ w + {[.] ? ∖ w *,)}^{'}) = “ w e a t h e r, ”

s u b s t r i n g (V_{1} f r o m' ([+ -] ? ∖ w + {[.] ? ∖ w *,)}^{'}) = “ c l e a r, ”

The regular expression [+−]?\w + [.]?\w* defines a pattern capable of matching both alphanumeric tokens and numeric values, including those with optional signs and decimal points. This expression can be used to identify textual elements ranging from categorical labels (e.g., “weather”) to numerical values such as 70, +5.3, or −1.0. For conciseness, we denote this pattern as

δ = [+ -] ? \ w + [.] ? w *

.

To extract the first value from a comma-separated string sequence, one may simply retrieve the substring that precedes the first delimiter (,). However, accessing an element at an arbitrary position i-th within the sequence necessitates a more refined approach, as it involves counting and matching the i-th delimited entry.

Extracting any value at the i-th position requires counting the number of elements in the sequence until the desired position is reached. To achieve this in a textual representation of a sequence of aspects, we use a regular expression construct known as lookbehind. The lookbehind expression is written in the form (?<=Y) X, which specifies that the query should match an element X that is preceded by a pattern Y. For example, to extract the heartbeat rate value, which is located at position

v_{3}

of V, we can verify that it is preceded by two elements and their separators, using a regular expression such as

\begin{matrix} (? < = (δ, δ,)) (δ), w h e r e Y = (δ, δ,) a n d X = (δ) \end{matrix}

Any value

λ_{i}

∈

Γ

or

v_{i} \in V

can be extracted using Algorithm 1. This algorithm describes the behavior of the regexLookbehind() function. It takes as input the index of the element to be extracted and the corresponding sequence (

Γ

or V). The function (lines 4 to 8) dynamically constructs a lookbehind regular expression, where the number of repeated

δ

patterns depends on the index parameter i. Finally, in line 9, the substring function applies the constructed lookbehind expression to extract the element at the specified position i from the sequence.

Algorithm 1 Create lookbehind expression.

1:: function regexLookbehind(i, $s e q u e n c e$ )
2:: $S \leftarrow s e q u e n c e$
3:: $δ \leftarrow^{'} [+ -] ? w + [.] ? w *^{'}$
4:: $r e g e x \leftarrow e m p t y s t r i n g$
5:: for $i n d e x = 1$ to $i n d e x = i - 1$ do
6:: $r e g e x \leftarrow$ regex + $δ +','$
7:: end for
8:: $r e g e x \leftarrow (? < =$ (regex)) + $δ$
9:: return substring(S from $r e g e x)$
10:: end function

Table 1 illustrates how to use the function from Algorithm 1 to extract all values from

A D_{e x}

. These extracted values are presented in a tabular format, as shown in Table 2. First, the query retrieves the names of each aspect in

Γ

, which form the header of Table 2. The first, second, and third rows of Table 2 correspond to the results of applying the regexLookbehind function to

V_{1}

,

V_{2}

, and

V_{3}

, respectively.

Operations for extracting values from textual representations are especially valuable when the objective is to display specific aspect values or conduct comparative analysis. For example, consider the query: “What are the most frequently used means of transportation when the temperature exceeds 60 °F?” In this case, it is necessary to first extract the temperature aspect value to evaluate it against the 60 °F threshold and subsequently extract the corresponding value of the means of transportation aspect. However, there are scenarios in which explicit value extraction is not required, as certain queries can be resolved using pattern-matching operations alone. For example, in the query “How many individuals arrived at college by subway on a clear day?” two aspects must be evaluated: the transportation mode being “subway” and the weather condition being “clear.” These aspects can be verified using the SQL LIKE operator. Table 3 illustrates how to construct expressions with the SQL LIKE operator to match the weather and transportation aspects within the tuple

V_{1}

= 〈clear, 69.1, 70, subway〉. The first row demonstrates the expression used to verify whether the weather is “clear,” while the second row checks if the transportation mode is “subway.” The second column in the table presents the outcome of evaluating each expression.

Nonetheless, the SQL LIKE operator lacks support for regular expressions and provides only a limited set of wildcard characters, thereby limiting its effectiveness in identifying complex patterns within textual data. Although the SQL LIKE can be conveniently employed to validate values at the boundaries of a sequence—specifically

v_{1}

and

v_{n}

—it is inadequate for accessing intermediate values

v_{i}

where 1 < i < n. For example, in Table 3, the aspects “weather” and “means of transportation” correspond to the first and last elements of

V_{1}

, respectively. Attempting to evaluate intermediate values using LIKE may lead to incorrect results.

Consider the query: “What are the most frequently used means of transportation on days when the ambient temperature is 60 °F?”. Evaluating this using the expression

V_{2}

LIKE %60%, where

V_{2}

= 〈“rain, 67.4, 60, taxi”〉, would mistakenly match the “heartbeat rate” rather than the intended “temperature” aspect, thereby yielding a false positive.

Performing such checks accurately requires the use of operators or functions that support Boolean regular expression matching. One such example is the ~ operator in PostgreSQL, which enables regex-based evaluation. Table 4 demonstrates how to access the

v_{2}

and

v_{3}

of the

V_{2}

sequence using regular expressions. The first column describes the semantics of the operation, the second provides the corresponding SQL expression, and the third shows the evaluation result. The first row in Table 4 verifies whether

v_{2}

= 60, while the second row checks whether

v_{3}

= 60. The regular expression “ˆ

δ

,” instructs the query engine to skip over

v_{1}

, and “ˆ

δ

,

δ

,” directs it to bypass both

v_{1}

and

v_{2}

, enabling accurate extraction of the desired values.

4.2. The ATrDW Logical Model

After establishing how multiple aspects can be represented within a single dimension and detailing the extraction of

Γ

and V values, this subsection outlines the integration of the AspectDimension within the ATrDW model. Figure 4 illustrates the multidimensional structure of the ATrDW, which adheres to the star schema architecture [41]. The model comprises four dimensions and a central fact table. The notation * in the diagram means that each tuple in the dimension tables can be related to many tuples in the fact table.

The first dimension represents the moving object and is modeled by the MovingObject entity. The temporal dimension is modeled by the time entity, which records the timestamp at which the moving object arrived at a specific POI,

P_{i}

, within the semantic trajectory. The spatial dimension is captured by the POI entity. The aspect entity models the AspectDimension.

The AspectDimension functions as an additional criterion for aggregation for the measures recorded in the fact table. Unlike conventional dimensions, the AspectDimension provides flexibility regarding both the types and number of values it can accommodate. This is achieved by associating each dimension member with a sequence of aspect types, denoted by

Γ

, which defines the semantic structure of the corresponding aspect values. This design allows heterogeneous types of semantic aspects to be represented compactly within a single attribute.

The fact table stores the measures associated with each event, where a moving object is observed at a specific POI and time, along with its corresponding semantic context captured in the AspectDimension.

The measures stored in the fact table are as follows:

num_trajectory: identifies the trajectory to which the POI belongs;
distance: represents the distance between the current POI and its immediate predecessor. This value is set to zero for the first POI in the trajectory;
total_distance: denotes the cumulative distance from the starting POI of the trajectory to the current POI. This value is zero for the initial POI;
duration: represents the total elapsed time from the beginning of the trajectory to the current POI. This value is zero for the initial POI;
total_duration: means the total displacement duration since the beginning of the trajectory. This value is 0 (zero) for the first POI of the trajectory;
position: specifies the ordinal position of the POI within the trajectory sequence.

5. An Analytical Semantic Trajectory Framework Architecture

This section presents the framework responsible for transforming semantically enriched trajectories into the proposed ATrDW model. To assess the effectiveness of our framework, we utilized two distinct datasets. The first is derived from the geosocial network Foursquare [9], comprising user trajectory data from the states of New York and New Jersey in the United States. The second dataset originates from the TripBuilder project [15] and consists of user trajectories in Italy, constructed by integrating geotagged Flickr data with semantic information from Wikipedia.

The ATrDW Framework Architecture

The proposed framework is capable of ingesting semantic trajectory data from heterogeneous datasets and generating a data warehouse according to the logical schema illustrated in Figure 4. Each dataset may include a distinct set of semantic aspects, and the framework is designed to accommodate and manage this variability effectively. To address diverse application scenarios, the framework provides two extension points: the first is the

I n p u t P O I

interface, and the second is the

A s p e c t D A O

interface. The overall architecture of the framework is depicted in Figure 5.

The

I n p u t P O I

interface is responsible for ingesting semantic trajectory data. Each application must implement this interface to enable the transformation of native data structures into a format that is both compatible with the framework and compliant with the ATrDW schema specification. The

n e x t P O I ()

function is invoked by the framework to iterate over each point of interest (POI) in the data source. It returns an object of type Message, an abstract Java class that encapsulates the necessary information for constructing instances of the ATrDW. The encapsulated information includes categories, POI names, location details (city, state, and country), and timestamps. The Message class defines two key abstract methods essential to the ETL process:

g e t A s p e c t s ()

, which returns the available aspects, and

g e t A s p e c t V a l u e (S t r i n g a s p)

, which retrieves the value of a specified aspect.

The implementation of these methods depends on the specific context in which the framework is applied. For instance, the TripBuilder and Foursquare datasets rely on different sets of aspects. In the Foursquare dataset, aspects include weather, rating, price, and day, while TripBuilder focuses solely on the aspect related to the means of transport. When extending the framework, developers must implement the

g e t A s p e c t s ()

and

g e t A s p e c t V a l u e (S t r i n g a s p)

methods in accordance with the requirements of the target scenario.

The component diagram in Figure 5 depicts the application of the framework in two distinct scenarios: processing trajectories from the Foursquare and TripBuilder datasets. The ETL component interacts with implementations of the

I n p u t P O I

interface to extract and transform the information associated with each point of interest (POI) from the input datasets. It then populates the corresponding dimensions of the ATrDW and computes the measures to be stored in the fact table. After completing these operations, the

D a t a W a r e h o u s e M a n a g e r

component is responsible for persisting the processed data into the target database.

The second extension point of the proposed framework is the

A s p e c t D A O

interface, which is invoked by the data warehouse (DW) manager during the persistence of aspect data in the database. This interface defines the set of functions that must be implemented for each specific scenario, enabling the ETL component to construct the corresponding

A s p e c t D i m e n s i o n

in the database according to scenario-specific requirements.

For each scenario—whether Foursquare, TripBuilder, or other datasets—the developer must implement two functions. The

c o l u m n s A s p e c t s I d ()

function defines the sequence of aspect types (the

Γ

sequence), while the

p u t A s p e c t s V a l u e s ()

function is called by the DW manager to persist the aspect values (the V sequence) within the ATrDW.

The implementation of

p u t A s p e c t s V a l u e s ()

is straightforward: it involves converting the aspects into textual format by retrieving each aspect and its corresponding value from the Message object. Once the aspect types and their values are obtained as text, the

D W M a n a g e r c o m p o n e n t

is responsible for storing this data in the ATrDW.

Algorithm 2 provides a detailed description of the ETL process workflow. To facilitate the computation of fact measures, certain trajectory-related data are temporarily stored in a status set defined as status = {

f i r s t P O I

,

l a s t P O I

, d, D,

d u

,

D u

,

p o s

}, where the elements represent:

firstPOI: the initial point of interest (POI) in the trajectory;
lastPOI: the most recent POI processed in the trajectory;
d: the spatial distance between the $l a s t P O I$ and the preceding POI;
D: the cumulative distance traveled by the mobile object up to $l a s t P O I$ ;
du: the elapsed time duration between the $l a s t P O I$ and its preceding POI;
Du: the total elapsed time from $f i r s t P O I$ to $l a s t P O I$ ;
pos: the ordinal position of $l a s t P O I$ within the trajectory sequence.

The status values are updated incrementally as the ETL process ingests new trajectory points. All trajectory-related statuses are maintained in a dictionary structure, denoted as

D s t a t u s

, where the primary key corresponds to the trajectory identifier (

t r a j I D

). Algorithm 2 processes one Point of Interest (POI) at a time and is invoked whenever the data input component receives a new POI.

The algorithm takes as input an instance of

I n p u t P O I

, the

D s t a t u s

dictionary representing the current status of each trajectory, and the

D W M a n a g e r

responsible for persisting data into the warehouse. The

I n p u t P O I

instance provides semantically enriched trajectory data, already sorted and organized according to Definition 1.

In line 3, the algorithm iterates over each POI P while there is input data available. From lines 4 to 7, the algorithm extracts the necessary information to construct each dimension within the ATrDW. Each dimension provides a method responsible for generating its hierarchy, levels, and members based on the current POI P. These methods return the primary keys for their respective dimensions, which are subsequently used as foreign keys in the fact table.

The status values are updated as the ETL process processes new trajectory points. All the trajectory statuses are stored in a dictionary, called

D s t a t u s

, where the primary key is the trajectory identifier (

t r a j I D

). Algorithm 2 processes one POI of the trajectory at a time. It is invoked each time the data input component receives a new POI. Algorithm 2 takes as input an instance of

I n p u t P O I

, the dictionary

D s t a t u s

representing the status, and the

D W M a n a g e r

. In

I n p u t P O I

, trajectory data has already been sorted and organized according to Definition 1. Line 3 of the algorithm iterates through each POI P as long as data is in the input. From lines 4 to 7, the algorithm extracts the necessary data to create each dimension in the ATrDW. Each dimension possesses a function responsible for constructing the hierarchies, levels, and members from the POI P. These functions return the primary key of each dimension, which serve as foreign keys in the fact table.

Algorithm 2 ETL algorithm

1:: Input: InputPOI $i n$ , Dictionary $D s t a t u s$ , DWManager $D W M$
2:: while $i n . h a s P O I ()$ do
3:: $P \leftarrow i n . n e x t P O I ()$
4:: $p k M o D i m \leftarrow M O D i m e n s i o n (P . M O)$
5:: $p k T i m e D i m \leftarrow T i m e D i m e n s i o n (P . t i m e)$
6:: $p k P o i D i m \leftarrow P O I D i m e n s i o n (P . n a m e, P . c a t e g o r y, P . l o c a t i o n)$
7:: $p k A s p D i m \leftarrow A s p e c t D i m e n s i o n (P . A s p)$
8:: $s t a t u s \leftarrow D s t a t u s (P . t r a j I D)$
9:: if $s t a t u s$ == $N U L L$ then
10:: $s t a t u s . f i r s t P O I \leftarrow P$
11:: $s t a t u s . p o s i t i o n \leftarrow 1$
12:: $D s t a t u s_{[P . t r a j I D]} \leftarrow s t a t u s$
13:: else
14:: $f i r s t P \leftarrow s t a t u s . f i r s t P O I$
15:: $l a s t P \leftarrow s t a t u s . l a s t P O I$
16:: $s t a t u s . d \leftarrow c a l c D i s t a n c e (P . l o c a t i o n, l a s t P . l o c a t i o n)$
17:: $s t a t u s . D \leftarrow s t a t u s . D + s t a t u s . d$
18:: $s t a t u s . d u \leftarrow c a l c D u r a t i o n (P . t i m e, l a s t P . t i m e)$
19:: $s t a t u s . D u \leftarrow c a l c D u r a t i o n (P . t i m e, f i r s t P . t i m e)$
20:: $s t a t u s . p o s \leftarrow s t a t u s . p o s + 1$
21:: end if
22:: $s t a t u s . l a s t P O I \leftarrow P$
23:: $M e a s u r e s \leftarrow s t a t u s . d, D, d u, D u, p o s$
24:: $D i m e n s i o n s \leftarrow p k M o D i m, p k T i m e D i m, p k P o i D i m, p k A s p D i m$
25:: $F a c t \leftarrow D i m e n s i o n s, M e a s u r e s$
26:: $D W M . p e r s i s t (F a c t)$
27:: end while

In line 8, the algorithm retrieves the current status of the trajectory from the

D s t a t u s

dictionary. From lines 10 to 11, if no prior status exists for the given trajectory, a new status object is initialized with all measure values set to zero. In line 12, this new status is stored in the dictionary using

t r a j I D

as the key.

From lines 14 to 20, the status object is updated with new values derived from the latest trajectory point. In line 25, a Fact object is created, containing both the dimension keys and the computed measures. In line 26, the

D W M a n a g e r

component persists the fact data—comprising dimension references and associated measures—into the ATrDW schema.

Once the ATrDW construction process is complete, external clients with appropriate credentials may perform analytical queries over the warehouse using standard OLAP operations such as roll-up, drill-down, slice, and dice [41].

6. Experiments

This section presents a set of analytical queries used to validate the proposed ATrDW model. The evaluation was performed using two datasets: Foursquare and TripBuilder.

The Foursquare dataset comprises 3079 trajectories, 15,402 POIs, and 193 users. In this context, the

Γ

sequence is defined as

〈 w e a t h e r, r a t i n g, p r i c e 〉

. The weather domain includes the values: clear, clouds, fog, rain, and snow. The rating represents a score from 0 to 10, assigned by the user to each visited POI, with higher scores indicating more favorable evaluations. The price is a score ranging from 1 to 4, where higher values reflect greater perceived cost to visit the POI, according to the user’s judgment.

In contrast, the TripBuilder dataset contains 55,474 trajectories and 2466 POIs. It includes a single aspect—means of transport—with the corresponding

Γ

sequence defined as

〈 t r a n s p o r t 〉

.

All experiments were executed on a computing environment equipped with an Intel Core i7-7700 3.60GHz processor, 32 GB of RAM, and a 500 GB hard disk drive, running the GNU/Linux Ubuntu 18.04 operating system.

The ETL process required approximately 7.9 min to extract, transform, and load the entire set of trajectories into the ATrDW. To evaluate the analytical capabilities of the proposed ATrDW framework, a total of 12 aggregation queries were formulated. These queries primarily focus on typical OLAP operations, including roll-up and drill-down, and aim to demonstrate the model’s ability to support analytical reasoning over semantic trajectory data.

The set of queries is presented in Table 5. They were inspired by analytical tasks proposed in prior research on STrDW [2,12,13]. Queries Q1, Q2, and Q11 are answered using aspects derived from the TripBuilder dataset, whereas the remaining queries are addressed using data from the Foursquare dataset. The queries are organized according to the type of analytical operation and semantic criterion involved.

Queries Q1 and Q2 illustrate drill-down operations from POI categories to specific POI names.
Queries Q3 and Q4 demonstrate roll-up operations, aggregating data from POI names to their corresponding categories.
Queries Q5 and Q6 perform temporal drill-down operations, disaggregating data from the year level to the semester level.
Queries Q7 and Q8 examine semantic transitions, specifically user movements between different POI categories.
Query Q9 analyzes mobility patterns involving trajectories that traverse three non-consecutive POIs.
Query Q10 investigates user movement between two POIs, with both locations filtered by name.
Query Q11 performs a similar analysis but filters the origin POI by name and the destination POI by category.
Query Q12 focuses on mobility involving a source POI filtered by name and a destination POI constrained by semantic aspects—specifically, rating and price.

These queries collectively validate the expressiveness and flexibility of the ATrDW in supporting complex semantic and spatiotemporal analysis tasks.

While the snowflake schema provides a robust mechanism for modeling complex hierarchical relationships, the star schema is generally more efficient for analytical query processing due to its simplified structure, which minimizes the number of join operations required—thereby improving query performance [39]. For this reason, we adopted the star schema for the physical implementation of the ATrDW, as illustrated in Figure 4. All analytical queries were implemented and executed against this schema.

To illustrate how analytical queries can be expressed using SQL—particularly those involving pattern-matching techniques discussed in Section 4—we present three representative examples. Table 6 provides the complete SQL formulations for these queries. For instance, in Query Q3, the objective is to compute the number of trajectories occurring within the geographic boundaries of New York City, where the average speed exceeds 25 mph. The weather condition is incorporated as a filtering criterion using the SQL

L I K E

operator. As detailed in Section 4, this operator is suitable in the given context because the weather attribute occupies the first position within the AspectDimension tuple, enabling direct textual pattern matching.

Query Q8 analyzes trajectories that intersect two specific categories of POIs: Entertainment and Restaurant. To minimize structural complexity and enhance query clarity, a temporary table named entertainment was utilized. This table preselects all trajectories that include at least one entertainment-related POI. Using the position attribute, it becomes possible to determine whether the entertainment and restaurant POIs occur consecutively within the trajectory sequence. Furthermore, Q8 leverages the

r e g e x L o o k b e h i n d ()

function to extract the evaluation score of a POI, which corresponds to the second aspect in the ordered set defined by the

A s p e c t D i m e n s i o n

.

Query Q10 presents an additional use case involving the analysis of consecutive POIs; however, rather than filtering by category, it targets specific POI names. In this case, aspect values are evaluated using the SQL regular expression match operator ~, which supports full regex semantics for binary true/false evaluation.

All queries were executed ten times to ensure consistent performance metrics. Table 7 reports the average execution time for each query. As demonstrated, the relatively short duration required for the ETL process is offset by the efficient execution of analytical queries enabled by the ATrDW structure. Figure 6 presents ranking queries in descending order of execution time. Notably, Q12, Q6, Q9, and Q11 exhibit the highest execution times, primarily due to their increased computational complexity and semantic filtering requirements.

7. Conclusions

Trajectory data can be semantically enriched through a wide variety of contextual information, commonly referred to as semantic aspects. Prior studies have consolidated semantic trajectory data within STrDWs, typically modeling each aspect as a dedicated dimension. However, the inclusion of multiple semantic aspects often results in highly complex warehouse schemas, which can hinder adaptability and scalability across diverse application domains.

This work introduces a novel and extensible multidimensional model—ATrDW—that simplifies the integration and representation of heterogeneous semantic aspects. Our ATrDW model represents all aspect types within a single, flexible dimension termed the AspectDimension. Aspects are encoded in free-form text, enabling their retrieval and analysis through pattern-matching techniques such as regular expressions, embedded within SQL queries. We demonstrated that our proposed design not only enhances schema maintainability but also facilitates semantically rich analytical queries without requiring structural modifications. We faced significant challenges in reproducing related work approaches due to the unavailability of source code and limited access to data sources in related work. These constraints hindered our ability to perform a fair and consistent comparative study.

Our primary focus was on extending the aspect dimension, which we identified as the most variable and application-dependent. This dimension can encompass a wide range of data types, including physiological signals (e.g., heart rate, blood pressure, emotional state), environmental parameters (e.g., weather conditions, temperature), and socioeconomic information (e.g., income, user ratings, price evaluations, transportation modes). Given this high degree of heterogeneity and relevance to diverse application domains, we prioritized addressing the extensibility of this dimension. The core dimensions—namely temporal and spatial—remain statically defined and do not support extensible or dynamic schema evolution. In future work, we plan to enhance our framework by incorporating extensibility in both the spatial and temporal dimensions.

Another important direction for future work involves addressing the linguistic diversity of end users. In real-world applications—particularly those deployed in multilingual regions or global platforms—trajectory data analysis systems must exhibit robustness to variations in language, dialect, and syntactic expression. Ensuring semantic interoperability across linguistic boundaries is critical for supporting inclusive and globally accessible analytical environments. Recent advancements in NLP, including pre-trained multilingual language models and cross-lingual embedding techniques, offer promising solutions to mitigate language-dependent limitations. Integrating such models into the ATrDW framework would enhance its capacity to interpret and process semantically annotated data across diverse linguistic contexts, thereby improving usability, accessibility, and internationalization support.

Author Contributions

Conceptualization, Damião Ribeiro de Almeida and Cláudio de Souza Baptista; methodology, Damião Ribeiro de Almeida and Anselmo Cardoso de Paiva; software, Damião Ribeiro de Almeida; validation, Damião Ribeiro de Almeida, Cláudio de Souza Baptista and Fabio Gomes de Andrade; formal analysis, Damião Ribeiro de Almeida and Anselmo Cardoso de Paiva; investigation, Damião Ribeiro de Almeida, Cláudio de Souza Baptista and Fabio Gomes de Andrade; resources, Damião Ribeiro de Almeida; data curation, Damião Ribeiro de Almeida; writing—original draft preparation, Damião Ribeiro de Almeida; writing—review and editing, Fabio Gomes de Andrade and Cláudio de Souza Baptista; visualization, Damião Ribeiro de Almeida; supervision, Cláudio de Souza Baptista and Fabio Gomes de Andrade; project administration, Damião Ribeiro de Almeida. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

For the sake of reproducibility, the data and program files are available in figshare.com at the private link at https://figshare.com/s/ff7141b3593683c0b281 (accessed on 22 July 2025).

Acknowledgments

The second and fourth authors would like to express their gratitude to the Brazilian National Council for Scientific and Technological Development (CNPq) for the research scholarships that supported this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kong, X.; Li, M.; Ma, K.; Tian, K.; Wang, M.; Ning, Z.; Xia, F. Big trajectory data: A survey of applications and services. IEEE Access 2018, 6, 58295–58306. [Google Scholar] [CrossRef]
Fileto, R.; Raffaetà, A.; Roncato, A.; Sacenti, J.A.; May, C.; Klein, D. A semantic model for movement data warehouses. In Proceedings of the 17th International Workshop on Data Warehousing and OLAP, Shanghai, China, 3–7 November 2014; pp. 47–56. [Google Scholar]
Alsahfi, T.; Almotairi, M.; Elmasri, R. A survey on trajectory data warehouse. Spat. Inf. Res. 2020, 28, 53–66. [Google Scholar] [CrossRef]
Emmanouilidis, C.; Koutsiamanis, R.A.; Tasidou, A. Mobile guides: Taxonomy of architectures, context awareness, technologies and applications. J. Netw. Comput. Appl. 2013, 36, 103–125. [Google Scholar] [CrossRef]
Parent, C.; Spaccapietra, S.; Renso, C.; Andrienko, G.; Andrienko, N.; Bogorny, V.; Damiani, M.L.; Gkoulalas-Divanis, A.; Macedo, J.; Pelekis, N.; et al. Semantic trajectories modeling and analysis. ACM Comput. Surv. (CSUR) 2013, 45, 42. [Google Scholar] [CrossRef]
Almeida, D.R.d.; Baptista, C.d.S.; Andrade, F.G.d.; Soares, A. A Survey on Big Data for Trajectory Analytics. ISPRS Int. J. Geo-Inf. 2020, 9, 88. [Google Scholar] [CrossRef]
Fileto, R.; May, C.; Renso, C.; Pelekis, N.; Klein, D.; Theodoridis, Y. The Baquara2 Knowledge-Based Framework for Semantic Enrichment and Analysis of Movement Data. Data Knowl. Eng. 2015, 98, 104–122. [Google Scholar] [CrossRef]
Mello, R.d.S.; Bogorny, V.; Alvares, L.O.; Santana, L.H.Z.; Ferrero, C.A.; Frozza, A.A.; Schreiner, G.A.; Renso, C. MASTER: A multiple aspect view on trajectories. Trans. GIS 2019, 23, 805–822. [Google Scholar] [CrossRef]
Petry, L.M.; Ferrero, C.A.; Alvares, L.O.; Renso, C.; Bogorny, V. Towards semantic-aware multiple-aspect trajectory similarity measuring. Trans. GIS 2019, 23, 960–975. [Google Scholar] [CrossRef]
Sampaio, M.C.; Baptista, C.d.S.; Sousa, A.G.d.; Nascimento, F.F.d. Enhancing decision support systems with spatial capabilities. In Intelligent Databases: Technologies and Applications; IGI Global: Hershey, PA, USA, 2007; pp. 94–116. [Google Scholar]
Almeida, D.R.d.; Vasconcelos, S.P.d.; Andrade, F.G.; Baptista, C.d.S. Towards a hybrid and semantically enriched trajectory data warehouse. In Proceedings of the 2021 IEEE/ACS 18th International Conference on Computer Systems and Applications (AICCSA), Tangier, Morocco, 30 November–3 December 2021; pp. 1–8. [Google Scholar]
Silva, M.C.T.; Times, V.C.; Macêdo, J.A.d.; Renso, C. Swot: A conceptual data warehouse model for semantic trajectories. In Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, Melbourne, VIC, Australia, 19–23 October 2015; ACM: New York, NY, USA, 2015; pp. 11–14. [Google Scholar]
Wagner, R.; Macedo, J.A.F.d.; Raffaetà, A.; Renso, C.; Roncato, A.; Trasarti, R. Mob-warehouse: A semantic approach for mobility analysis with a trajectory data warehouse. In Proceedings of the International Conference on Conceptual Modeling, Hong Kong, China, 11–13 November 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 127–136. [Google Scholar]
Fayad, M.; Schmidt, D.C. Object-oriented application frameworks. Commun. ACM 1997, 40, 32–38. [Google Scholar] [CrossRef]
Brilhante, I.; Macedo, J.A.; Nardini, F.M.; Perego, R.; Renso, C. Tripbuilder: A tool for recommending sightseeing tours. In Proceedings of the European Conference on Information Retrieval, Amsterdam, The Netherlands, 13–16 April 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 771–774. [Google Scholar] [CrossRef]
Spaccapietra, S.; Parent, C. Adding meaning to your steps (keynote paper). In Proceedings of the International Conference on Conceptual Modeling, Brussels, Belgium, 31 October–3 November 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 13–31. [Google Scholar]
Chen, L.; Shang, S.; Jensen, C.S.; Yao, B.; Kalnis, P. Parallel semantic trajectory similarity join. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, 20–24 April 2020; pp. 997–1008. [Google Scholar]
Zheng, B.; Yuan, N.J.; Zheng, K.; Xie, X.; Sadiq, S.; Zhou, X. Approximate keyword search in semantic trajectory database. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering, Seoul, Republic of Korea, 13–17 April 2015; pp. 975–986. [Google Scholar]
Campora, S.; Macedo, J.A.F.d.; Spinsanti, L. St-toolkit: A framework for trajectory data warehousing. In Proceedings of the 14th AGILE Conference on Geographic Information Science, Utrecht, The Netherlands, 18–21 April 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1–12. [Google Scholar]
Laube, P. The low hanging fruit is gone: Achievements and challenges of computational movement analysis. SIGSPATIAL Spec. 2015, 7, 3–10. [Google Scholar] [CrossRef]
Soares, A.; Rose, J.; Etemad, M.; Renso, C.; Matwin, S. Vista: A visual analytics platform for semantic annotation of trajectories. In Proceedings of the 22nd International Conference on Extending Database Technology (EDBT), Lisbon, Portugal, 26–29 March 2019; pp. 570–573. [Google Scholar]
Soares, A.; Renso, C.; Matwin, S. An active learning system for trajectory classification. IEEE Comput. Graph. Appl. 2017, 37, 28–39. [Google Scholar] [CrossRef] [PubMed]
Zheng, Y.; Chen, Y.; Li, Q.; Xie, X.; Ma, W.Y. Anderstanding transportation modes based on gps data for web applications. ACM Trans. Web (TWEB) 2010, 4, 1. [Google Scholar] [CrossRef]
Yan, Z.; Chakraborty, D.; Parent, C.; Spaccapietra, S.; Aberer, K. SeMiTri: A framework for semantic annotation of heterogeneous trajectories. In Proceedings of the 14th International Conference on Extending Database Technology, Uppsala, Sweden, 21–24 March 2011; ACM: New York, NY, USA, 2011; pp. 259–270. [Google Scholar]
Yang, L.; Hu, Z.; Long, J.; Guo, T. 5w1h-based conceptual modeling framework for domain ontology and its application on stpo. In Proceedings of the 2011 Seventh International Conference on Semantics, Knowledge and Grids, Beijing, China, 24–26 October 2011; pp. 203–206. [Google Scholar]
Bogorny, V.; Renso, C.; de Aquino, A.R.; de Lucca Siqueira, F.; Alvares, L.O. Constant—A Conceptual Data Model for Semantic Trajectories of Moving Objects. Trans. GIS 2014, 18, 66–88. [Google Scholar] [CrossRef]
Leonardi, L.; Marketos, G.; Frentzos, E.; Giatrakos, N.; Orlando, S.; Pelekis, N.; Raffaetà, A.; Roncato, A.; Silvestri, C.; Theodoridis, Y. T-warehouse: Visual olap analysis on trajectory data. In Proceedings of the 2010 IEEE 15 26th International Conference on Data Engineering (ICDE 2010), Long Beach, CA, USA, 1–6 March 2010; pp. 1141–1144. [Google Scholar]
Leonardi, L.; Orlando, S.; Raffaetà, A.; Roncato, A.; Silvestri, C.; Andrienko, G.; Andrienko, N. A general framework for trajectory data warehousing and visual OLAP. GeoInformatica 2014, 18, 273–312. [Google Scholar] [CrossRef]
Manaa, M.; Akaichi, J. Ontology-based trajectory data warehouse conceptual model. In Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery, Porto, Portugal, 9–11 November 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 329–342. [Google Scholar]
Wu, X.; Yu, J.; Zhao, X. Spatio-temporal keyword query in semantic trajectories. Front. Comput. Sci. 2022, 16, 162602. [Google Scholar] [CrossRef]
Sun, H.; Guo, X.; Yang, Z.; Chu, X.; Liu, X.; He, L. Predicting future locations with semantic trajectories. ACM Trans. Intell. Syst. Technol. (TIST) 2022, 13, 1–20. [Google Scholar] [CrossRef]
Ribeiro de Almeida, D.; de Souza Baptista, C.; de Andrade, F.G. Similarity search on semantic trajectories using text processing. ISPRS Int. J. Geo-Inf. 2022, 11, 412. [Google Scholar] [CrossRef]
Garani, G.; Arboleda, F.J.M.; Verykios, V.S. A novel approach for handling semantic trajectories on data warehouses. Intell. Decis. Technol. 2022, 16, 679–690. [Google Scholar] [CrossRef]
Luo, Y.; Cao, Z.; Jin, X.; Liu, K.; Yin, L. Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models. In Proceedings of the 2024 25th IEEE International Conference on Mobile Data Management (MDM), Brussels, Belgium, 24–27 June 2024; pp. 289–294. [Google Scholar]
Seep, J. Analyzing Semantically Enriched Trajectories. Künstliche Intell. 2024, 38, 127–131. [Google Scholar] [CrossRef]
Pugliese, C.; Lettich, F.; Renso, C.; Pinelli, F. MAT-Builder: A System to Build Semantically Enriched Trajectories. In Proceedings of the 23rd IEEE International Conference on Mobile Data Management, MDM 2022, Paphos, Cyprus, 6–9 June 2022; pp. 274–277. [Google Scholar]
Pugliese, C.; Lettich, F.; Pinelli, F.; Renso, C. Summarizing Trajectories Using Semantically Enriched Geographical Context. In Proceedings of the 31st ACM International Conference on Advancesin Geographic Information Systems, SIGSPATIAL 2023, Hamburg, Germany, 13–16 November 2023; ACM: New York, NY, USA, 2023; pp. 1–10. [Google Scholar]
Pugliese, C.; Lettich, F.; Pinelli, F.; Renso, C. Understanding human mobility dynamics: Insights from summarized semantic trajectories. In Proceedings of the 2024 25th IEEE International Conference on Mobile Data Management (MDM), Brussels, Belgium, 24–27 June 2024; pp. 159–164. [Google Scholar]
Hamann, J.; Hagen, T. Revealing Trip Purposes in Raw GPS Data by Applying a Multi-Phase Clustering Approach to Semantic Trajectories. IEEE Trans. Intell. Transp. Syst. 2024, 26, 3543–3556. [Google Scholar] [CrossRef]
Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 2009. [Google Scholar]
Vaisman, A.; Zimànyi, E. Data Warehouse Systems: Design and Implementation; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]

Figure 1. Sequence of semantic annotations of a trajectory.

Figure 2. Conceptual Schema of an STrDW.

Figure 3. AspectDimension in a data cube example.

Figure 4. ATrDW star schema.

Figure 5. The ATrDW framework architecture.

Figure 6. Chart of query times in descending order of execution time.

Table 1. Query to obtain all

A D_{e x}

values.

Table 1. Query to obtain all

A D_{e x}

values.

Description: Return All ${AD}_{ex}$ Aspect Values
SELECT regexLookbehind(1, $Γ$ ), regexLookbehind(2, $Γ$ )
regexLookbehind(3, $Γ$ ), regexLookbehind(4, $Γ$ )
union all
SELECT regexLookbehind(1, V1), regexLookbehind(2, V1),
regexLookbehind(3, V1), regexLookbehind(4, V1)
union all
SELECT regexLookbehind(1, V2), regexLookbehind(2, V2),
regexLookbehind(3, V2), regexLookbehind(4, V2)
union all
SELECT regexLookbehind(1, V3), regexLookbehind(2, V3),
regexLookbehind(3, V3), regexLookbehind(4, V3)

Table 2. SQL query result from Table 1.

Weather	Temperature	Heartbeat	Transport
clear	69.1	70	subway
rain	67.4	60	taxi
cloud	68.5	65	walk

Table 3. Pattern matching using SQL LIKE operator.

Expression	Result
$V_{1}$ like ‘clear%’ =	true
$V_{1}$ like ‘%subway’ =	true

Table 4. Pattern matching using regex operator.

Constraint	Operation	Result
$ν_{2}$ == 60	$V_{2} \sim'^δ, 60^{'}$	false
$ν_{3}$ == 60	$V_{2} \sim'^δ, δ, 60^{'}$	true

Table 5. Queries on semantic trajectories.

Query ID	Query
Q1	What was the average distance traveled by people who used public transport to go to school?
Q2	What was the average distance traveled by people who used public transportation to visit the City College of New York?
Q3	How many trajectories had average speed greater than 25 mph in rainy weather in New York City?
Q4	How many trajectories in New York City during rainy weather have an average speed greater than 25 mph?
Q5	What was the total distance traveled in 2012 by all users in New York City who made at least one stop at a restaurant during their trips?
Q6	What was the total distance traveled by all users in New York City during the 2012 semesters who made at least one stop at Liberty State Park?
Q7	What is the average speed of users when they are driving from home to a mall?
Q8	On average, how long does it take a person to leave an entertainment and visit a highly rated restaurant ¹?
Q9	Approximately, what is the total distance traveled by people within the State of New Jersey in the year 2012, satisfying the Home-Work-Entertainment moving pattern, not necessarily consecutive, lasting at least 4 h?
Q10	On average, how long did it take people to visit Central Park and then Times Square on a clear morning?
Q11	In 2012, what is the total distance traveled by people who started at the New York Sports Clubs, and some time later used a subway to get to the Mall?
Q12	What is the average length per month of trajectories that started at some entertainment venue, then spent some time in Central Park, and ended up somewhere in New York that was highly rated and expensive ²?

¹ We consider highly rated as rating > 6. ² We consider high price as price >= 2.

Table 6. SQL queries of selected examples Q3, Q8 and Q10.

Query ID	Query
Q3	SELECT COUNT(distinct f.num trajectory) FROM fact f, tb aspect asp, tb poi poi WHERE f.id aspect = asp.id AND f.id poi = poi.id AND f.position > 1 AND asp.value LIKE ‘Rain, %’ AND poi.city = ‘New York’ AND f.duration > 0 AND ((f.distance)/(f.duration) > 25)
Q8	WITH entertainment AS ( SELECT num trajectory, f.id user, f.position FROM fact f, tb poi dimPoi WHERE f.id poi = dimPoi.id AND dimPoi.category = ‘Entertainment’ GROUP BY num trajectory, f.id user, f.position ) SELECT SUM(f.duration)/COUNT(f.id user) FROM fact f, tb poi dimPoi, tb aspect dimAspect, entertainment WHERE f.id poi = dimPoi.id AND dimAspect.id = f.id aspect AND dimPoi.category = ‘Restaurant’ AND f.num trajectory = entertainment.num trajectory AND f.id user = entertainment.id user AND f.position = entertainment.position + 1 AND (regexlookbehind(2, dimAspect.value)::numeric) > 6
Q10	WITH CentralPark AS ( SELECT num trajectory, f.id user, f.position FROM fact f, tb poi dimPoi WHERE f.id poi = dimPoi.id AND dimPoi.name LIKE ‘%Central Park%’ GROUP BY num trajectory, f.id user, f.position ) SELECT SUM(f.duration)/COUNT(f.id user) FROM fact f, tb poi dimPoi, tb aspect dimAspect, tb time dimTime, CentralPark WHERE f.id poi = dimPoi.id AND f.id aspect = dimAspect.id AND f.id time = dimTime.id AND dimPoi.name LIKE ‘%Times Square%’ AND dimAspect.value ~‘Clear,’ AND dimTime.hour >= 5 AND dimTime.hour < 12 AND f.num trajectory = CentralPark.num trajectory AND f.id user = CentralPark.id user AND f.position = CentralPark.position + 1

Table 7. Query execution average time.

Query	Time (ms)
Q1	35.1
Q2	36.8
Q3	30.5
Q4	19.2
Q5	93.4
Q6	36.7
Q7	40.1
Q8	31.4
Q9	83.4
Q10	39.9
Q11	57.8
Q12	92.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Almeida, D.R.d.; Baptista, C.d.S.; Andrade, F.G.d.; Paiva, A.C.d. Towards an Extensible and Text-Oriented Analytical Semantic Trajectory Framework. ISPRS Int. J. Geo-Inf. 2025, 14, 292. https://doi.org/10.3390/ijgi14080292

AMA Style

Almeida DRd, Baptista CdS, Andrade FGd, Paiva ACd. Towards an Extensible and Text-Oriented Analytical Semantic Trajectory Framework. ISPRS International Journal of Geo-Information. 2025; 14(8):292. https://doi.org/10.3390/ijgi14080292

Chicago/Turabian Style

Almeida, Damião Ribeiro de, Cláudio de Souza Baptista, Fabio Gomes de Andrade, and Anselmo Cardoso de Paiva. 2025. "Towards an Extensible and Text-Oriented Analytical Semantic Trajectory Framework" ISPRS International Journal of Geo-Information 14, no. 8: 292. https://doi.org/10.3390/ijgi14080292

APA Style

Almeida, D. R. d., Baptista, C. d. S., Andrade, F. G. d., & Paiva, A. C. d. (2025). Towards an Extensible and Text-Oriented Analytical Semantic Trajectory Framework. ISPRS International Journal of Geo-Information, 14(8), 292. https://doi.org/10.3390/ijgi14080292

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards an Extensible and Text-Oriented Analytical Semantic Trajectory Framework

Abstract

1. Introduction

Our Main Contributions

2. Basic Concepts

3. Related Work

4. The Aspect Trajectory Data Warehouse Model

4.1. Pattern Matching

4.2. The ATrDW Logical Model

5. An Analytical Semantic Trajectory Framework Architecture

The ATrDW Framework Architecture

6. Experiments

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI