Enhancing Efficiency in Transportation Data Storage for Electric Vehicles: The Synergy of Graph and Time-Series Databases

Šidlovský, Marko; Ravas, Filip

doi:10.3390/wevj16050269

Open AccessArticle

Enhancing Efficiency in Transportation Data Storage for Electric Vehicles: The Synergy of Graph and Time-Series Databases

by

Marko Šidlovský

^*,†

and

Filip Ravas

^†

Faculty of Transportation Sciences, Czech Technical University in Prague, Konviktská 20, 110 00 Prague, Czech Republic

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

World Electr. Veh. J. 2025, 16(5), 269; https://doi.org/10.3390/wevj16050269

Submission received: 14 January 2025 / Revised: 30 March 2025 / Accepted: 6 April 2025 / Published: 14 May 2025

Download

Browse Figures

Versions Notes

Abstract

This article introduces a novel hybrid database architecture that combines graph and time-series databases to enhance the storage and management of transportation data, particularly for electric vehicles (EVs). This model addresses a critical challenge in modern mobility: handling large-scale, high-velocity, and highly interconnected datasets while maintaining query efficiency and scalability. By comparing a naive graph-only approach with our hybrid solution, we demonstrate a significant reduction in query response times for large data contexts-up to 64% faster in the XL scenario. The scientific contribution of this research lies in its practical implementation of a dual-layer storage framework that aligns with FAIR data principles and real-time mobility needs. Moreover, the hybrid model supports complex analytics, such as EV battery health monitoring, dynamic route optimization, and charging behavior analysis. These capabilities offer a multiplier effect, enabling broader applications across urban mobility systems, fleet management platforms, and energy-aware transport planning. By explicitly considering the interconnected nature of transport and energy data, this work contributes to both carbon emission reduction and smart city efficiency on a global scale.

Keywords:

hybrid database architecture; graph databases; time-series data; electric vehicles (EVs); transportation data management; data storage optimization; mobility as a service (MaaS); big data in transportation

1. Introduction

In recent years, the electrification of transportation has led to a rapid expansion of electric vehicle (EV) ecosystems, resulting in an unprecedented volume and complexity of data. Unlike traditional internal combustion vehicles, modern EVs continuously generate large-scale time-series telemetry (e.g., battery state-of-charge, charging events, and temperature readings) while interacting with graph-based structures such as road networks, charger locations, and vehicle-user relationships. These data streams exhibit the “four Vs” of big data—Volume, Velocity, Variety, and Veracity—and present unique challenges for storage, retrieval, and real-time analytics.

Efficient data management in this context is essential not only for technical performance but also for achieving broader societal goals, such as reducing carbon emissions, optimizing energy consumption, and minimizing time spent in traffic. Traditional relational databases often struggle to meet these needs due to their limitations in managing dynamic, interconnected, and high-frequency data.

Existing transport data systems-whether relational or graph-based-struggle to manage the combined spatial and temporal complexity of electric mobility data. Graph databases are effective for modeling relationships but are not optimized for high-frequency telemetry, while time-series databases handle temporal data well but lack support for complex network structures. This mismatch is especially limiting in EV ecosystems, where real-time sensor data and dynamic routing must be analyzed together.

To address these challenges, this study introduces and evaluates a novel hybrid database architecture that integrates graph and time-series databases. The hybrid model enables scalable, flexible, and efficient storage of transportation data by leveraging the strengths of both systems. Through a comparative analysis with a naive graph-only approach, we demonstrate the hybrid model’s superior performance in large-scale scenarios, making it a suitable solution for the growing demands of electric mobility data management.

The scientific contribution of this research lies in the practical implementation and benchmarking of a hybrid data storage model tailored for real-time electric mobility systems. This architecture is not only technically effective, but also broadly applicable to domains such as fleet management, smart transportation networks, and infrastructure planning. As a result, the proposed approach offers a multiplier effect: by enabling data-driven decisions in EV systems, it indirectly contributes to global sustainability goals and advances in smart city development.

The remainder of this paper is organized as follows: Section 2 presents a literature review on transport system models, Mobility as a Service (MaaS), and graph–time-series database integration. Section 3 outlines the research methodology and dataset. Section 4 presents experimental results comparing the naive and hybrid approaches. Section 5 discusses the broader implications of the findings. Finally, Section 6 summarizes the study’s key contributions and outlines directions for future research.

2. Literature Review

2.1. Transport System Models

Transport system models are essential for simulating, analyzing, and optimizing transportation networks. These models are typically divided into three categories:

2.1.1. Demand Models

Demand models estimate travel behavior by incorporating socio-economic and behavioral factors. Common approaches include:

Trip Generation Models–Predict the number of trips based on land use and population data [1].
Trip Distribution Models–Allocate trips between origins and destinations using gravity or activity-based methods [2].
Mode Choice Models–Evaluate how travelers select between different modes (e.g., public transit, private vehicles, walking) based on cost, time, and convenience [3].
Route Assignment Models–Determine how trips are distributed across available networks, often via shortest-path or equilibrium methods [4].

Required Data: These include household travel surveys, census data, trip diaries, and smart card information [5].

2.1.2. Supply Models

Supply models detail the infrastructure and services of a transportation system:

Road Network Models–Capture road geometry, connectivity, and traffic control elements [6].
Public Transport Models–Outline schedules, routes, vehicle capacities, and passenger loads [7].
Logistics and Freight Models–Describe the movement of goods, including warehouse locations and delivery routes [8].

Required Data: These models rely on GIS datasets, traffic sensor data, transit timetables, and vehicle tracking information [9].

2.1.3. Interaction Models

Interaction models integrate both supply and demand aspects to simulate complex dynamics such as congestion and multimodal transfers, with modular simulation frameworks enabling the modeling of interaction-rich transport systems through agent-based approaches that capture ad hoc interactions and decision-making processes [10].

Required Data: Essential inputs include real-time traffic data, vehicle telematics, origin-destination matrices, and travel time reliability metrics [5].

2.2. Measuring Acceptance of Mobility as a Service (MaaS)

A key emerging area in transportation research is the assessment of user acceptance for Mobility as a Service (MaaS) systems-integrated platforms that combine multiple mobility options (e.g., ride-hailing, public transit, car-sharing, micromobility) [11]. Research methods in this area include:

Stated Preference Surveys–Elicit hypothetical choices from users [12].
Behavioral Data Analysis–Use real-world data from apps and GPS tracking to evaluate mobility patterns [13].
Machine Learning and Predictive Modeling–Analyze large-scale datasets to understand factors such as cost sensitivity and convenience [14].

Relevant Data Sources: App-based travel logs, transaction records, and behavioral surveys are commonly employed [11].

Recent pilot surveys by Rindone and Vitetta [12] provide empirical evidence on MaaS acceptance. Their findings indicate that user acceptance increases with rising generalized transport costs. They also demonstrate the benefit of combining Stated Preference (SP), Revealed Preference (RP), and Technology Acceptance Model (TAM) methodologies to comprehensively assess user behavior-insights that are valuable for the design of large-scale MaaS implementations.

2.3. Graph and Hybrid Database Integration

Graph databases have gained significant traction for their ability to efficiently store and query highly associative data [15,16]. Their strength lies in representing complex networks of relationships-capabilities particularly useful in domains such as social networking and transportation. For example, ref. [17] explores advanced querying techniques, while ref. [18] offers a comprehensive survey of graph database models, discussing data structures, query languages, and integrity constraints.

In terms of performance, studies such as refs. [19,20] have identified systems like DEX and Neo4j as efficient options, with further enhancements available through multi-threading and GPU-based implementations. Additional comparisons by [21,22] demonstrate that indexing and optimization techniques can yield significant improvements. In transportation applications, refs. [23,24] illustrate the benefits of using graph databases to construct transportation knowledge graphs, while ref. [25] shows that graph databases often outperform traditional relational systems, though challenges remain for memory-intensive and large-scale networks [26].

To address these challenges, hybrid architectures that integrate relational, graph, and time-series databases have emerged [27,28]. Graph databases excel at capturing complex network relationships and multimodal interactions, whereas time-series databases efficiently store real-time data such as traffic conditions and vehicle telematics. This integrated approach not only enables efficient management of heterogeneous data types but also enhances our ability to simulate transport scenarios, predict MaaS adoption trends, and optimize overall mobility services [5].

3. Methodology

This research methodology is structured to provide a rigorous and systematic examination of the performance of different database storage systems within the electric transportation sector. While our current dataset does not contain actual EV telemetry, the following approach is designed to handle EV-specific data-including potential battery metrics (state-of-charge, temperature), charging station interactions, and complex routing information-if and when such data become available. This chapter delineates the comprehensive approach undertaken, encompassing research design, data collection, and data analysis methods.

3.1. Research Design

The research design employed in this study is a quantitative approach, primarily focusing on empirical data analysis. This method was selected due to its suitability in objectively evaluating and comparing the performance metrics of different database systems. The quantitative nature of the design allows for the precise measurement of variables such as query response time. This approach enables the application of statistical methods to assess the efficacy of the hybrid database system in contrast to a traditional graph database system, providing concrete, numerical evidence to support our findings. The design is adaptable for EV data by incorporating additional variables (e.g., battery details, charging details, temperature) into the same evaluation framework. In this way, although our current dataset comprises primarily standard car-sharing parameters, the methodology remains applicable to high-frequency, multifaceted EV data.

3.2. Data Collection

The data collection process for this study was designed with meticulous attention to detail, aiming to encompass a wide array of pertinent transportation data.

The primary dataset for this research was sourced from the Uniqway project. Uniqway [29], a pioneering Czech student carsharing service, emerged from the collaborative efforts of students and faculties from the Czech Technical University in Prague, Czech University of Life Sciences, and the University of Economics, Prague, alongside the support of ŠKODA AUTO Digilab s.r.o. Launched on 17 October 2018, Uniqway represents a unique integration of academic and practical realms in transportation, offering a service specifically tailored for students and staff of higher education institutions. With a distinctive fleet comprising various ŠKODA models, Uniqway facilitates mobility through a user-friendly app, enabling efficient vehicle reservations and management. This service exemplifies innovative transportation solutions and provides a rich dataset reflective of real-world urban mobility patterns, making it an ideal subject for transportation data studies.

The Uniqway dataset includes transportation-related parameters such as car_id, speed, locked, gps_lat, gps_lon, and received_date, collected at a frequency of every 10 s during active trips. Each record is stored in a structured format (JSON), enabling time-based and geospatial analysis. An example record may look like:

{ "car_id": 12, "speed": 48.5, "gps_lat": 50.076, "gps_lon": 14.423, "locked": false, "received_date": "2023-01-05T14:23:10Z" }

To simulate potential EV use cases, we conceptually extended the dataset’s structure to include key EV telemetry fields, such as battery state-of-charge (SOC), state-of-health (SOH), charging power, charging duration, and temperature—although these fields were not populated with real data. At the time of analysis, Uniqway operated only one EV, and insufficient telemetry was available to support detailed EV modeling. Nevertheless, these fields are fully supported by our architecture, as outlined in Table 1, allowing for future integration of real-time EV data.

By combining energy data with transportation variables such as route trajectories, trip frequency, and driving behavior, the proposed hybrid database system enables a connected analysis of mobility and energy consumption. This approach supports practical applications such as charging infrastructure planning, dynamic energy demand forecasting, and battery degradation monitoring—key aspects of sustainable and efficient EV fleet management.

For this analysis, data from January 2023 provided a comprehensive snapshot of transportation patterns. The parameters and detailed information of the dataset are clearly presented in Table 1 and Table 2.

3.3. Data Storage Architecture

In our study, a consistent methodology was applied across both database architectures for the aggregation of geolocation data, utilizing the H3 geospatial indexing system [30]. The H3 system is an advanced geospatial indexing framework that segments the globe into hexagonal cells, facilitating detailed and precise geographical data representation. It is an open-source system governed by the Apache 2 license, and its core library encompasses a comprehensive suite of functions. These functions include the transformation of latitude and longitude coordinates into corresponding H3 cells, determination of cell centers, extraction of cell boundary geometries, and identification of neighboring cells, among others. Throughout our analysis, we maintained a uniform resolution of 12 in the H3 system, ensuring consistency and precision in our geospatial data handling across different database solutions.

Two different database storage options were employed:

3.3.1. Naive: Graph Database Only

Utilizing the Amazon Web Services (AWS) Neptune [31], the graph database system was structured with two types of nodes:

Data Nodes: Each node in this category is generated for every individual data point from the input dataset. These nodes are defined with specific parameters, as outlined in the Section 3.2.
H3 Nodes: These nodes serve as geographical aggregators of data points situated in close proximity. They are distinct and are derived from the GPS locations of the data points. An H3 Node is shared between two data points if their GPS positions fall within the same H3 index. Each node possesses a unique h3 index parameter associated with a specific resolution.

Regarding the relational structure, an arc is established between an H3 Node and a DataNode when the GPS location of the data point falls within the boundaries of an H3 polygon corresponding to a particular index, effectively matching the H3 node’s identifier. This relational arc is crucial in mapping the geospatial connectivity between individual data points and their aggregated geolocation nodes. The types of data nodes and their relationships are illustrated in Figure 1.

3.3.2. Hybrid Approach

In the hybrid approach, the core principle involves storing raw data exclusively in the Time-series database (AWS Timestream [31]). In contrast, the AWS Neptune graph database functions primarily as a reference system, utilizing nodes as pointers to the Time-series database. This configuration comprises three distinct node types: H3 nodes for geolocation aggregation, Data Pointer nodes referencing specific entries in the Time-series database, and Info Holder nodes containing aggregated results of particular parameters. The hybrid approach operates in two phases: information creation and information retrieval. The information creation phase involves generating Data Pointer and Info Holder nodes and interacting with the Time-series database, whereas the retrieval phase is expedient, primarily querying the Graph database.

Node Types:

H3 Nodes: These nodes are similar to those described in Naive approach, serving as geolocation aggregators.
Data Pointer Nodes: Functioning as intermediaries, these nodes link to pertinent data in the Time-series database. They contain parameters like the time-series table and date and are associated with specific H3 Nodes. A Data Pointer Node is created if at least one data point exists for a given geolocation and date. The node’s parameters are instrumental in formulating queries to the time-series table.
Info Holder Nodes: These nodes encapsulate additional insights derived from the data. Connected to Data Pointer nodes, they store computed results, like the maximum of selected parameters, for future reference.

Arc Relationships:

Between H3 Node and Data Pointer Node: An arc is established if the time-series data contains a record for a specific geolocation and date. An H3 Node can link to multiple Data Pointer Nodes, differentiated by the date parameter.
Between Data Pointer Node and Info Holder Node: Following the selection of required information (e.g., maximum of a parameter), a Data Pointer Node directs to an Info Holder Node with the computed result. A single Data Pointer can link to numerous Info Holder Nodes for different computational functions.

Figure 2 showcases the various data node types and their interconnections.

Operational Mechanics in the Hybrid Database:

Create Information: This process involves defining the context of the desired information, encompassing three elements: what (parameter and aggregation function), when (time frame), and where (geolocation). The operational steps are as follows:
(a)
Query H3 Nodes based on the geolocation criterion.
(b)
Upon selecting the relevant H3 Nodes, initiate and execute a query to the Time-series database to perform the aggregation function for the associated dates.
(c)
The outcome of this query is then recorded in new Data Pointer Nodes, which are organized according to the dates, and in new Info Holder Nodes that encapsulate the results of the aggregated function.
Get Information: The retrieval process is guided by the context (what, when, where), traversing H3 Nodes (for location filtering), Data Pointer Nodes (for time filtering), and Info Holder Nodes (for parameter filtering). The outcome is the aggregated result from the selected Info Holder Nodes.

The operational mechanisms of the two approaches under discussion are visually represented in Figure 3.

3.4. Data Analysis

Statistical analysis measures such as mean and standard deviation were employed to evaluate the performance data. These foundational statistics provided a straightforward yet effective means of assessing central tendencies and variability within the database systems’ performance metrics.

4. Results

In this chapter, we present the outcomes of our research. Following the methodology described previously, the results are divided into three parts: the setup of the infrastructure, the arrangement of the tests, and the main findings.

4.1. Infrastructure Setup

The infrastructure setup for our study was executed entirely on AWS to replicate a production-ready environment. This section details the specific parameters of AWS Neptune, which was the primary database service used. The Neptune database was configured with engine version 1.2.1.0, set to operate in a serverless manner. This allowed for dynamic scalability, with the minimum Neptune Compute Units (NCUs) set at 2.5 and the maximum at 128. NCUs serve as a measurement of computing capacity within the Neptune database system. This unit is analogous to the traditional CPU (Central Processing Unit) but is tailored to the specific computational requirements of Neptune’s graph database service. This configuration was chosen to ensure optimal performance and scalability in line with the demands of our research.

In the hybrid setup, AWS Timestream was used as the time-series database. It was configured in on-demand write and query mode using default table settings and retention policies. All data writes and queries were handled via the native WriteRecords and Query APIs.

For the performance evaluation, two query types were executed:

In the graph-only (naive) approach, data retrieval was performed using Gremlin traversal queries, where the system traversed from H3 nodes to data nodes, applying filters for specific date ranges and computing aggregates such as maximum speed.
In the hybrid model, the process was divided into two phases. The create information phase used Timestream SQL queries to compute aggregates (e.g., MAX(speed)) over time and location intervals. The results were then written into the graph database as Info Holder nodes. In the get information phase, only Gremlin queries were used to retrieve these precomputed results from the graph database.

The primary performance metric was query execution time, measured in seconds. Each query test was repeated three times, and the average and standard deviation were reported. Measurements were captured using application-level logging with timestamps (Python 3.12), as well as AWS CloudWatch Logs for validation. The runtime metric includes the full query execution duration, excluding the time required for database initialization or infrastructure provisioning.

4.2. Tests Setup

In this section, we describe the setup of our tests, each of which was defined within a specific context. The context for each test is determined by three parameters: What, When, and Where. The ‘What’ parameter refers to the aspect to be measured, encompassing both the parameter and the aggregate function applied: for instance, the maximum speed. ‘When’ specifies the time range for the data, such as from 1–15 January 2023. ‘Where’ defines the geographical area for the test, delineated by the GPS border positions of a hexagon, which is then converted into a set of H3 polygons within that polygon.

Four distinct contexts exist for the tests, labeled S, M, L, and XL. These labels indicate the relative size or scope of each context. Three parameters define context:

What: selected parameter and aggregation
When: date range
Where: H3 polygons

For our tests, we selected the maximum speed for the ‘What’ parameter. Respective sizes of contexts can be found in the Table 3.

4.3. Main Findings

To ensure better precision of our findings, each context within our study was tested three times. Table 4 and Table 5 present the results as mean values and standard deviations (in seconds), rounded to four decimal places. The “Create Information” phase in the Hybrid Approach includes querying the time-series database, computing aggregates, and writing results to the graph database.

Building upon this foundation, our study revealed insights during the comparison of the ‘get information’ phases across different database approaches. The Graph Database Only approach exhibited superior speed in smaller contexts, labeled as S and M, highlighting its efficiency in handling queries within a smaller data scope. In contrast, for larger contexts, specifically L and XL, the Hybrid Approach’s ‘get information’ phase was significantly faster, indicating its effectiveness in managing extensive data queries.

However, it’s important to note the Hybrid Approach’s requirement for a preliminary ‘create information’ phase, which must be executed once before the first ‘get information’ operation. This prerequisite may influence the approach’s overall time efficiency, especially when frequent database updates are necessary. Despite this, the enhanced performance in larger contexts emphasizes the Hybrid Approach’s potential in specific data management scenarios.

Our study used data from just one source for one month. If we used more data sources, covered a longer time, and included a bigger area, the results would likely change a lot. The Naive Approach works okay with small data like in our study. But, with more and bigger data, it might not work well at all.

On the other hand, the Hybrid Approach could handle big and diverse data better. Even though it needs a first step (the ‘create information’ phase), this might not be a big issue compared to how well it works with large amounts of data.

Our findings indicate that a hybrid database architecture is well-suited for managing large-scale datasets with diverse spatial and temporal dimensions. This observation aligns with prior research highlighting the scalability and flexibility of hybrid systems in spatiotemporal data integration and analytics [32,33]. However, challenges such as increased management complexity and data sprawl have also been documented [34,35]. Further research involving broader datasets and real-time electric vehicle telemetry is necessary to validate the generalizability of this approach.

5. Discussion

Our research contributes to the field by providing empirical evidence on the efficacy of different database architectures in managing transportation data. It bridges a gap in existing literature by offering a comparative analysis of these systems in a real-world context, using data from the Uniqway project.

Moreover, by extending the dataset structure to include energy-related variables such as battery state-of-charge and charging logs, our architecture lays the foundation for a unified analysis of transportation and energy consumption. This alignment is particularly crucial for electric vehicle applications, where route planning, battery usage, and charging behavior are interdependent. Future work with real-world EV telemetry will enable deeper insights into optimizing mobility and energy efficiency simultaneously.

Building on these contributions, the following discussion interprets and analyzes the results of our study within the context of its research objectives and the broader framework of existing literature. Our study primarily focused on evaluating the efficiency of different data storage strategies for transportation data, a domain inherently reliant on graph-oriented databases. However, the implications of our findings extend to EV data, which involves high-frequency battery telemetry (e.g., state-of-charge and temperature) and complex charging station interactions. These additional EV attributes further underscore the value of scalable databases that manage both time-series updates and graph relationships in real time. The results, particularly in the context of the ‘get information’ phase, provide meaningful insights into the practical applications of these database systems for both conventional and EV-focused transportation scenarios.

In smaller contexts (S and M), the Graph Database Only approach demonstrated greater speed, aligning with previous literature findings that emphasize graph databases’ efficiency in handling less complex data sets [19,20]. This reinforces the notion that graph databases are well-suited for specific tasks, particularly those involving less extensive data networks [26].

Conversely, in larger contexts (L and XL), the Hybrid Approach, integrating graph and time-series databases, showed a marked increase in speed during the ‘get information’ phase. This aligns with the recent studies by [27,28], which highlight the benefits of combining different database types for more efficient data handling. This finding is significant as it addresses our central research question, demonstrating the potential of hybrid databases in managing large-scale, graph-oriented transportation data, thereby extending the existing body of knowledge in this domain.

However, an aspect of the Hybrid Approach was the necessity of a ‘create information’ phase, which, while only required once before the initial ‘get information’ operation, could impact overall efficiency in scenarios necessitating frequent database updates. This highlights a potential limitation in the adaptability of the Hybrid Approach in specific operational contexts.

In addition to the technical performance gains demonstrated in our experiments, this study presents a notable scientific contribution through the design and evaluation of a hybrid data architecture that unifies time-series and graph-based storage mechanisms. Unlike prior research that typically focuses on one data model in isolation, our work introduces an integrated, scalable solution tailored specifically for electric vehicle (EV) data systems. This hybrid model bridges two complementary paradigms to support both real-time telemetry and relational network structures, enabling advanced use cases such as dynamic route optimization, battery health monitoring, and charging infrastructure analysis.

Compared to prior studies, such as [19,21], which evaluated graph database performance on benchmark networks, our results show that while graph databases are effective in small-to-medium contexts, their performance significantly degrades with larger datasets. Similarly, previous hybrid models [27,28] demonstrated the conceptual benefits of combining relational and graph approaches but lacked empirical validation using real-world transportation data. In contrast, our study applies a hybrid graph–time-series architecture to an operational urban mobility dataset and quantifies its performance across four scaling contexts. The hybrid model demonstrated up to 64% improvement in query execution time compared to the graph-only approach in the XL dataset, highlighting its scalability and practical value for real-time data processing in electric mobility systems.

Beyond its immediate application, the hybrid architecture offers broad transferability and long-term impact. Its flexible design can be adapted to diverse transportation domains, including smart public transit systems, autonomous fleet management, and Mobility as a Service (MaaS) platforms. As cities increasingly shift toward data-driven mobility planning, the ability to efficiently handle high-frequency and interconnected datasets provides a foundation for sustainable, energy-efficient, and intelligent transport solutions. Thus, this work not only delivers empirical benchmarks but also contributes a reusable and extensible framework to the field of transportation informatics.

Limitations and Future Work

While this study provides valuable insights into the performance of hybrid and graph-only database architectures, it also has several limitations. The evaluation was conducted using data from a single mobility service (Uniqway) over a one-month period, which may not fully represent the diversity or scale of real-world EV data ecosystems. Additionally, the extended EV-related parameters (e.g., battery state-of-charge, charging sessions) were conceptually modeled but not yet tested with real telemetry data.

Future research should focus on validating this hybrid architecture using real-time EV telemetry from multiple sources, longer time periods, and larger-scale deployments. Further development could also explore dynamic update scenarios, integration with machine learning pipelines, and extensions to support predictive analytics and real-time decision support for smart mobility systems.

6. Conclusions

This study has systematically explored the efficiency of various database storage strategies for managing transportation data, a sector characterized by its reliance on graph-oriented structures. Our research highlighted the distinct advantages and limitations of both graph-only and hybrid (graph and time-series) database systems through a quantitative evaluation across multiple data contexts.

For smaller data contexts (S and M), the graph-only approach demonstrated superior speed. Specifically, in the S context (14 H3 polygons over 3 days), it achieved an average query runtime of 0.3278 ± 0.0738 s, compared to 0.6886 ± 0.0538 s for the hybrid system. Similarly, in the M context, the graph-only model slightly outperformed hybrid with 1.7912 ± 0.0768 s versus 1.8228 ± 0.0289 s. These results reaffirm the suitability of graph databases for handling small-scale or less complex data networks.

However, in larger contexts (L and XL), the hybrid approach significantly outperformed the graph-only model. For instance, in the L context (over 94,000 H3 polygons), the hybrid model completed the query in 14.9080 ± 1.4579 s, while the graph-only system required 41.3642 ± 4.0808 s. In the XL context (more than 151,000 polygons), the hybrid system reduced query time to 31.8433 ± 0.3965 s, compared to 88.0410 ± 3.9076 s for the naive approach. These results clearly demonstrate the hybrid model’s scalability and efficiency in high-volume, high-frequency data environments.

While the hybrid model introduces an additional “create information” phase (e.g., 276.4272 ± 7.8580 s in the XL context), this cost is offset by its significantly faster performance during repeated “get information” queries. This trade-off is particularly beneficial in long-term EV data systems, where queries are performed far more often than data is initially processed.

These findings extend the existing body of knowledge in transportation data management by providing concrete, empirical benchmarks for choosing between database strategies. Furthermore, the study’s implications are particularly relevant to EV applications, where high-frequency, time-sensitive data-such as battery health metrics, charging logs, and geospatial mobility patterns-require both scalable and performant storage solutions.

In practical terms, the hybrid approach supports a wide range of electric vehicle management applications. For example, by efficiently linking time-series data (such as battery state-of-charge or charging cycles) with spatial relationships (e.g., charger locations, route networks), fleet operators can implement real-time battery monitoring, optimize energy usage based on travel behavior, and predict charging demand across geographies. Municipalities and service providers can also use the model to analyze mobility trends over time and space, enabling smarter decisions in urban planning, charger deployment, and traffic optimization. These capabilities highlight the system’s adaptability and value in supporting data-driven decision-making in real-world electric mobility scenarios.

In conclusion, the hybrid graph–time-series model emerges as a powerful and scalable solution for managing large-scale electric mobility data, enabling advanced use cases such as dynamic route planning, charging infrastructure optimization, and battery degradation analysis. Beyond its practical advantages, this work contributes a novel and adaptable data architecture that can serve as a foundation for future research and development in transportation informatics, particularly where real-time, high-frequency, and interconnected data streams are critical.

Author Contributions

The contributions of the authors to the research project and manuscript preparation were significant and varied. M.Š. was instrumental in conceptualizing the study and took the lead in drafting the manuscript, providing the foundation for the research. F.R. played a crucial role in conducting the data analysis and made significant contributions to interpreting the results, ensuring the data’s relevance and accuracy were effectively communicated. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Grant Agency of the Czech Technical University in Prague, grant No. SGS22/129/OHK2/2T/16.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

All registered users of the Uniqway service had provided explicit consent for their data to be used for research purposes.

Data Availability Statement

The data utilized in this research were provided exclusively for the purposes of academic inquiry within the university. Consequently, the dissemination of these datasets in their original form is restricted and cannot be made publicly available.

Acknowledgments

The authors would like to acknowledge the support provided by the Czech Technical University in Prague. During the preparation of this work the author(s) used ChatGPT (GPT-4 Turbo) and Elicit (20.2.2025) for language correction and literature review. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

EV	Electric vehicle
AWS	Amazon Web Services
CPU	Central Processing Unit
FAIR	Findable, Accessible, Interoperable, and Reusable
GPS	Global Positioning System
H3	H3 geospatial indexing system
NCUs	Neptune Compute Units
S	Small context size
M	Medium context size
L	Large context size
XL	Extra Large context size
JSON	JavaScript Object Notation

References

Meyer, M.D.; Miller, E.J. Urban Transportation Planning: A Decision-Oriented Approach; McGraw-Hill: New York, NY, USA, 2001. [Google Scholar]
Anas, A. On the gravity model in transportation analysis: Theory and extensions. Transportation 1998, 25, 265–278. [Google Scholar]
Bhat, C.R.; Guo, J.Y. Recent advances in discrete choice modeling and the implications for transportation planning. Transp. Res. Part B Methodol. 2007, 41, 903–931. [Google Scholar]
Sheffi, Y. Urban Transportation Networks: Equilibrium Analysis with Mathematical Programming Methods; Prentice Hall: Upper Saddle River, NJ, USA, 1985. [Google Scholar]
Mahmassani, H.S. 50th Anniversary Invited Article—Autonomous Vehicles and Connected Vehicle Systems: Flow and Operations Considerations. Transp. Sci. 2016, 50, 1140–1162. [Google Scholar] [CrossRef]
Gao, Y.; Qu, Z.; Song, X.; Yun, Z. Modeling of urban road network traffic carrying capacity based on equivalent traffic flow. Simul. Model. Pract. Theory 2022, 115, 102462. [Google Scholar] [CrossRef]
Ceder, A. Public Transit Planning and Operation: Theory, Modeling and Practice; Elsevier: Amsterdam, The Netherlands, 2007. [Google Scholar]
Agatz, N.; Erera, A.; Savelsbergh, M.; Wang, X. Optimization for dynamic ride-sharing: A review. Eur. J. Oper. Res. 2008, 223, 295–307. [Google Scholar] [CrossRef]
Litman, T. Evaluating public transportation health benefits. J. Transp. Health 2019, 12, 54–61. [Google Scholar]
Jakob, M.; Moler, Z. Modular framework for simulation modelling of interaction-rich transport systems. In Proceedings of the 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), The Hague, The Netherlands, 6–9 October 2013; pp. 2152–2159. [Google Scholar] [CrossRef]
Kamargianni, M.; Matyas, M.; Li, W. An integrated framework for modelling and evaluating the impacts of Mobility-as-a-Service (MaaS) systems. Transp. Rev. 2016, 36, 601–623. [Google Scholar]
Rindone, C.; Vitetta, A. Measuring Potential People’s Acceptance of Mobility as a Service: Evidence from Pilot Surveys. Information 2024, 15, 333. [Google Scholar] [CrossRef]
Hensher, D.A. Modelling and Forecasting Travel Demand and the Effect of Travel Policy; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
Shaheen, S.; Cohen, A. Carsharing and personal vehicle services: Worldwide market developments and emerging trends. Int. J. Sustain. Transp. 2013, 7, 5–34. [Google Scholar] [CrossRef]
Darshana Shimpi, S.C. An Overview of Graph Databases. In Proceedings of the International Conference on Recent Trends in Information Technology and Computer Science 2012, ICRTITCS2012, Mumbai, India, 17–18 December 2012; pp. 16–22. [Google Scholar]
Jain, R.; Iyengar, S.; Arora, A. Overview of popular graph databases. In Proceedings of the 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), Tiruchengode, India, 4–6 July 2013; pp. 1–6. [Google Scholar] [CrossRef]
Flesca, S.; Greco, S. Querying Graph Databases. In Proceedings of the Advances in Database Technology—EDBT 2000, Konstanz, Germany, 27–31 March 2000; Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T., Eds.; pp. 510–524. [Google Scholar]
Angles, R.; Gutiérrez, C. Survey of graph database models. ACM Comput. Surv. 2008, 40, 1–39. [Google Scholar] [CrossRef]
Dominguez-Sal, D.; Urbón-Bayes, P.; Giménez-Vañó, A.; Gómez-Villamor, S.; Martínez-Bazán, N.; Larriba-Pey, J.L. Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark. In Proceedings of the Web-Age Information Management, Jiuzhaigou, China, 15–17 July 2010; Shen, H.T., Pei, J., Özsu, M.T., Zou, L., Lu, J., Ling, T.W., Yu, G., Zhuang, Y., Shao, J., Eds.; pp. 37–48. [Google Scholar]
Morishima, S.; Matsutani, H. Performance Evaluations of Graph Database Using CUDA and OpenMP Compatible Libraries. SIGARCH Comput. Archit. News 2014, 42, 75–80. [Google Scholar] [CrossRef]
McColl, R.C.; Ediger, D.; Poovey, J.; Campbell, D.; Bader, D.A. A performance evaluation of open source graph databases. In Proceedings of the First Workshop on Parallel Programming for Analytics Applications, New York, NY, USA, 16 February 2014; PPAA ’14. pp. 11–18. [Google Scholar] [CrossRef]
Mpinda, S.A.T.; Ferreira, L.C.; Ribeiro, M.X.; Santos, M.T.P. Evaluation of graph databases performance through indexing techniques. Int. J. Artif. Intell. Appl. (IJAIA) 2015, 6, 87–98. [Google Scholar] [CrossRef]
Czerepicki, A. Application of graph databases for transport purposes. Bull. Pol. Acad. Sci. Tech. Sci. 2016, 64, 457–466. [Google Scholar] [CrossRef]
Šidlovský, M.; Ravas, F. Building knowledge graph in the transportation domain. In Proceedings of the 2023 Smart City Symposium Prague (SCSP), Prague, Czech Republic, 25–26 May 2023; pp. 1–4. [Google Scholar] [CrossRef]
Chen, J.; Song, Q.; Zhao, C.; Li, Z. Graph Database and Relational Database Performance Comparison on a Transportation Network. In Proceedings of the Advances in Computing and Data Sciences, Valletta, Malta, 24–25 April 2020; Singh, M., Gupta, P.K., Tyagi, V., Flusser, J., Ören, T., Valentino, G., Eds.; pp. 407–418. [Google Scholar]
Miler, M.; Medak, D.; Odobašić, D. The Shortest Path Algorithm Performance Comparison in Graph and Relational Database on a Transportation Network. Promet-Traffic Transp. 2014, 26, 75–82. [Google Scholar] [CrossRef]
Vyawahare, H.; Karde, P.; Thakare, V. A Hybrid Database Approach Using Graph and Relational Database. In Proceedings of the 2018 International Conference on Research in Intelligent and Computing in Engineering (RICE), San Salvador, El Salvador, 22–24 August 2018; pp. 1–4. [Google Scholar] [CrossRef]
Grund, M.; Cudré-Mauroux, P.; Krüger, J.; Plattner, H. Hybrid graph and relational query processing in main memory. In Proceedings of the 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), Brisbane, QLD, Australia, 8–12 April 2013; pp. 23–24. [Google Scholar] [CrossRef]
Šidlovský, M.; Ravas, F.; Jirovský, V. Uniqway-students’ carsharing project transforms mobility. In Proceedings of the FISITA 2021 World Congress, London, UK, 14–16 September 2021. [Google Scholar] [CrossRef]
Uber Technologies, Inc. H3: Hexagonal Hierarchical Geospatial Indexing System. 2025. Available online: https://github.com/uber/h3 (accessed on 23 March 2025).
Amazon Web Services. Overview of Amazon Web Services. 2025. Available online: https://docs.aws.amazon.com/whitepapers/latest/aws-overview/aws-overview.pdf (accessed on 23 March 2025).
Zheng, Y.; Huang, Q.; Yuan, Y.; Wang, S. Design and implementation of a GIS platform architecture for spatiotemporal big data. Future Gener. Comput. Syst. 2018, 87, 256–268. [Google Scholar]
Islam, A. Hybrid Cloud Databases for Big Data Analytics: A Review of Architecture, Performance, and Cost Efficiency. Int. J. Manag. Inf. Syst. Data Sci. 2024, 1, 96–114. [Google Scholar] [CrossRef]
Fortinet. 5 Challenges with Hybrid and Hyperscale Data Center Security and How to Solve Them. 2021. Available online: https://www.fortinet.com/content/dam/fortinet/assets/checklists/checklist-5-challenges-with-hybrid-and-hyperscale-data-center.pdf (accessed on 23 March 2025).
Cao, G. Deep Learning of Big Geospatial Data: Challenges and Opportunities. In New Thinking in GIScience; Li, B., Shi, X., Zhu, A.X., Wang, C., Lin, H., Eds.; Springer Nature: Singapore, 2022; pp. 159–169. [Google Scholar] [CrossRef]

Figure 1. The diagram of nodes and arc in graph database-only approach.

Figure 2. The diagram of nodes and arc in the hybrid approach.

Figure 3. Operational Mechanics: comparison of Naive and Hybrid approach.

Table 1. Parameters used and potential extensions.

Parameter	Datatype	Testing Dataset
received_date	Date	Yes
gps_alt	Float	Yes
gps_lat	Float	Yes
gps_lon	Float	Yes
locked	Boolean	Yes
speed	Float	Yes
car_id	Integer	Yes
battery_soc	Float	No
battery_soh	Float	No
charging_power	Float	No
charging	Boolean	No
env_temp	Float	No

Table 2. Descriptive analysis of data.

Parameter	Value
Total Number of Data Records	4,623,271
Time Range	January 2023
Total Number of Unique Cars	37
Max Speed	202

Table 3. Contexts selected parameters.

Context	Date Range	H3 Polygons Count
S	1–3 January 2023	14
M	1–7 January 2023	2530
L	1–15 January 2023	94,507
XL	1–25 January 2023	151,610

Table 4. Runtime of create information on different dataset contexts for the hybrid approach.

Context	HA Create Information (s)
S	3.2189 ± 0.6449
M	9.4585 ± 1.2732
L	165.4485 ± 4.1787
XL	276.4272 ± 7.8580

Table 5. Runtime comparison on different dataset contexts.

Context	Hybrid Approach (s)	Naive Approach (s)
S	0.6886 ± 0.0538	0.3278 ± 0.0738
M	1.8228 ± 0.0289	1.7912 ± 0.0768
L	14.9080 ± 1.4579	41.3642 ± 4.0808
XL	31.8433 ± 0.3965	88.0410 ± 3.9076

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Šidlovský, M.; Ravas, F. Enhancing Efficiency in Transportation Data Storage for Electric Vehicles: The Synergy of Graph and Time-Series Databases. World Electr. Veh. J. 2025, 16, 269. https://doi.org/10.3390/wevj16050269

AMA Style

Šidlovský M, Ravas F. Enhancing Efficiency in Transportation Data Storage for Electric Vehicles: The Synergy of Graph and Time-Series Databases. World Electric Vehicle Journal. 2025; 16(5):269. https://doi.org/10.3390/wevj16050269

Chicago/Turabian Style

Šidlovský, Marko, and Filip Ravas. 2025. "Enhancing Efficiency in Transportation Data Storage for Electric Vehicles: The Synergy of Graph and Time-Series Databases" World Electric Vehicle Journal 16, no. 5: 269. https://doi.org/10.3390/wevj16050269

APA Style

Šidlovský, M., & Ravas, F. (2025). Enhancing Efficiency in Transportation Data Storage for Electric Vehicles: The Synergy of Graph and Time-Series Databases. World Electric Vehicle Journal, 16(5), 269. https://doi.org/10.3390/wevj16050269

Article Menu

Enhancing Efficiency in Transportation Data Storage for Electric Vehicles: The Synergy of Graph and Time-Series Databases

Abstract

1. Introduction

2. Literature Review

2.1. Transport System Models

2.1.1. Demand Models

2.1.2. Supply Models

2.1.3. Interaction Models

2.2. Measuring Acceptance of Mobility as a Service (MaaS)

2.3. Graph and Hybrid Database Integration

3. Methodology

3.1. Research Design

3.2. Data Collection

3.3. Data Storage Architecture

3.3.1. Naive: Graph Database Only

3.3.2. Hybrid Approach

3.4. Data Analysis

4. Results

4.1. Infrastructure Setup

4.2. Tests Setup

4.3. Main Findings

5. Discussion

Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI