An Artificial Intelligence Enhanced Transfer Graph Framework for Time-Dependent Intermodal Transport Optimization

Anbri, Khalid; El Moufid, Mohamed; Zahidi, Yassine; Dachry, Wafaa; Gziri, Hassan; Medromi, Hicham

doi:10.3390/asi9010010

Open AccessArticle

An Artificial Intelligence Enhanced Transfer Graph Framework for Time-Dependent Intermodal Transport Optimization

by

Khalid Anbri

^1,2,*

,

Mohamed El Moufid

²

,

Yassine Zahidi

²

,

Wafaa Dachry

^1,2,

Hassan Gziri

^1,2 and

Hicham Medromi

^2,3

¹

Laboratory Engineering and Innovation of Advanced Systems, Faculty of Science and Technology, University Hassan 1st, Settat 539, Morocco

²

Foundation for Research Development and Innovation in Science and Engineering, Casablanca 20000, Morocco

³

The International Academy of Scientific Francophone (IASF), Rabat 10100, Morocco

^*

Author to whom correspondence should be addressed.

Appl. Syst. Innov. 2026, 9(1), 10; https://doi.org/10.3390/asi9010010 (registering DOI)

Submission received: 9 November 2025 / Revised: 16 December 2025 / Accepted: 17 December 2025 / Published: 26 December 2025

(This article belongs to the Special Issue Advances in Mathematical Models and Computational Intelligence for Transportation System Planning and Management)

Download

Browse Figures

Versions Notes

Abstract

In the digital era, rapid urban growth and the demand for sustainable mobility are placing increasing pressure on transport systems, where congestion, energy consumption, and schedule variability complicate intermodal journey planning. This work proposes an AI-enhanced transfer-graph framework that models each transport mode as an independent subnetwork connected through explicit transfer arcs. This modular structure captures modal interactions while reducing graph complexity, enabling algorithms to operate more efficiently in time-dependent contexts. A Deep Q-Network (DQN) agent is further introduced as an exploratory alternative to exact and meta-heuristic methods for learning adaptive routing strategies. Exact (Dijkstra) and meta-heuristic (ACO, DFS, GA) algorithms were evaluated on synthetic networks reflecting Casablanca’s intermodal structure, achieving coherent routing with favorable computation and memory performance. The results demonstrate the potential of combining transfer-graph decomposition with learning-based components to support scalable intermodal routing.

Keywords:

intermodal transportation; shortest path problem; artificial intelligence; reinforcement learning; optimization

1. Introduction

The rapid growth of urbanization and the increasing demand for sustainable mobility have placed pressure on modern transportation systems. Cities worldwide are showing rising congestion, environmental concerns, and the need to improve accessibility while ensuring efficiency and user satisfaction. To meet these challenges, intermodal transport systems integrating diverse modes [1,2] such as rail, tram, bus, private vehicles, and urban parking have emerged in intelligent transportation systems (ITS) [3].

Before addressing technical considerations, it is essential to clarify the fundamental concepts underlying intermodal mobility. In the transport literature, multimodality generally refers to the coexistence of several transport modes that users may independently select. Intermodality denotes their coordinated use within one continuous journey involving structured transfers between modes. Although some studies use the term multimodal to describe integrated trips [4]. This work adopts the more precise definition in which intermodal transport refers to journeys combining multiple modes in a unified travel chain.

Modern transport networks are also inherently time-dependent, meaning that travel conditions, including vehicle frequencies, travel times, and waiting periods, vary throughout the day based on scheduled operations or traffic dynamics. Managing such variability requires sophisticated decision-making capabilities supported by intelligent transportation systems (ITS), which rely on computational models, real-time data, and optimization strategies to deliver reliable and efficient mobility services.

Designing effective routing solutions for intermodal, time-dependent systems is challenging because passenger journeys must simultaneously balance several criteria, such as travel time, cost, comfort, number of transfers, and schedule synchronization. Traditional shortest-path algorithms, such as Dijkstra’s, although proven and effective in static networks, struggle in real-world multimodal settings where the network is heterogeneous and route feasibility changes dynamically over time [5].

To address these challenges, various graph-based models have been proposed. Graph-based models are particularly relevant because they provide a formal representation of transport infrastructure: vertices denote stations or facilities, edges represent feasible time-dependent connections, and weights encode travel time, cost, or transfer penalties.

Despite their value, existing graph-based models present notable shortcomings. Hypergraphs are expressive but often computationally expensive; hierarchical models simplify computation but may overlook critical intermodal interactions; and heuristic strategies, such as Ant Colony Optimization and Genetic Algorithms, improve scalability but may sacrifice solution accuracy. At the same time, the parallel progress of optimization and learning-based strategies has sparked curiosity about their integration, motivating hybrid frameworks that merge structural rigor with adaptive decision-making.

The goal of this research is to develop a time-dependent intermodal transport solution that achieves an effective balance between computational time and memory requirements, while demonstrating scalability and applicability to realistic scenarios spanning multiple cities, regions, or even countries. The study further seeks to answer the following question: To what extent can a hybrid framework, which integrates transfer-graph–based decomposition with machine learning techniques, particularly reinforcement learning, effectively enhance routing efficiency and adaptability in time-dependent intermodal networks?

To overcome these limitations, the core contribution of this work lies in the development of a unified and operational framework that integrates structural modeling, computational scalability, and adaptive routing for time-dependent intermodal transport networks:

An extended transfer-graph model that captures time-dependent schedules, multi-criteria costs, intermodal transfers, and parking access within a single coherent representation.
A decomposition strategy designed explicitly for the transfer-graph structure, enabling efficient computation by separating intramodal and intermodal searches and avoiding the dimensional growth of time-expanded graphs.
A unified benchmarking environment that compares Dijkstra, ACO, DFS, and GA under identical time-dependent intermodal conditions, and evaluates their performance across network sizes ranging from small to metropolitan-scale graphs. This enables a systematic assessment of accuracy, scalability, and memory efficiency on instances that reflect the structural complexity of real urban transport networks.
An AI-enhanced routing framework that integrates a Deep Q-Network (DQN) with the transfer-graph structure, using action masking, temporal state encoding, and transfer-aware penalties to enable the agent to learn adaptive routing policies under schedule variability. This design also establishes a general interface that can accommodate alternative reinforcement learning methods within the same transfer-graph environment.

The remainder of this paper is structured as follows: Section 2 provides a review of the main contributions in graph-based modeling, heuristic optimization, and reinforcement learning applied to intermodal transport systems. Section 3 formulates the time-dependent intermodal routing problem and introduces the proposed transfer-graph representation. Section 4 presents the optimization framework, including the objective functions, model constraints, and computational complexity. Section 5 describes the algorithmic solution strategy, detailing the implementation of Dijkstra, Ant Colony Optimization, Depth-First Search, and Genetic Algorithm within the transfer-graph model. Section 6 outlines the experimental design and performance evaluation. Section 7 extends the framework by integrating a Deep Q-Network reinforcement learning agent within the transfer graph. Section 8 discusses the results in the context of related literature and intelligent transportation systems. Section 9 concludes the paper and highlights directions for future research.

2. Literature Review

Research on intermodal transport optimization has progressed along three primary directions: graph-based modeling, heuristic and metaheuristic optimization, and learning-based routing methods. Each line of work contributes specific mechanisms for representing multimodal networks, managing their complexity, and addressing the temporal and operational uncertainties inherent in real-world mobility systems.

2.1. Graph-Based Modeling of Intermodal Transport

Graph-based models constitute the structural foundation of intermodal routing, providing a unified mathematical representation of transport infrastructure, schedules, and modal interactions. Foundational studies on multimodal and intermodal systems [1] highlight the importance of integrating heterogeneous modes such as rail, tram, bus, private vehicles, and parking facilities into coherent mobility frameworks.

A variety of graph structures have been proposed to capture different aspects of multimodal networks. Accessibility graphs, introduced by ref. [4], assess spatial reachability and modal connectivity in urban environments. Although valuable for urban planning, these models do not explicitly represent temporal variability or delays.

In contrast, time-dependent graphs, such as those developed by ref. [5], support route computation under varying travel times and scheduled departures. This improves realism but becomes computationally intensive when applied to large-scale multimodal networks.

The colored-edge graph model proposed by ref. [6] encodes transport modes as edge attributes, enabling multi-criteria optimization and the identification of Pareto-efficient paths. However, the combinatorial explosion in feasible paths limits its scalability in dense networks.

The transfer-graph model, introduced by ref. [7], decomposes multimodal networks into unimodal components interconnected via transfer arcs. This significantly reduces graph dimensionality and computation time, and subsequent extensions such as distributed transition-graph schemes [8] demonstrate the model’s effectiveness for large urban systems.

More advanced space–time representations, including the improved ripple-spreading algorithm of ref. [9], incorporate schedule-based temporal propagation and realistic transfer conditions. While these models excel in capturing real operational dynamics, they incur substantial computational overhead, especially in large-scale or densely connected networks.

2.2. Heuristics and Metaheuristics for Intermodal Routing

Decomposes complex intermodal networks into unimodal components connected by transfer arcs. Heuristic and metaheuristic methods have been widely adopted to overcome the computational challenges of exact graph-based algorithms in complex, time-dependent networks.

Memetic algorithms, such as the approach proposed by ref. [10], combine global exploration and local refinement to yield efficient near-optimal routes in multimodal settings. Broader families of metaheuristics, including Ant Colony Optimization, Genetic Algorithms, and Tabu Search, have been systematically reviewed by ref. [11], which emphasizes their adaptability, scalability, and performance under diverse constraints. However, these methods often require careful parameter tuning and may lack formal guarantees of optimality.

Classical multimodal shortest-path algorithms explored by refs. [12,13] provide foundational problem formulations but are limited in their ability to respond to dynamic updates, delays, and real-time operational changes. To better address uncertainty, ref. [12] introduce a robust multimodal shortest-path model that incorporates interval-based timetable data, improving reliability under fluctuating travel and waiting times. This robustness comes at the cost of increased computational complexity.

Beyond algorithmic considerations, several studies incorporate environmental, capacity, and infrastructure-related constraints. Ref. [14] uses a multimodal framework to evaluate park-and-ride strategies under environmental considerations.

2.3. Deep Learning and Reinforcement Learning for Dynamic Intermodal Routing

Recently, data-driven methods, especially deep learning and reinforcement learning (RL), have emerged as powerful tools for managing complexity, uncertainty, and variability in intermodal transport systems.

Deep learning has been applied to multimodal routing in several works. Refs. [15,16] uses neural networks to optimize transshipment in multimodal freight networks, a joint scheduling framework for barges and tugboats in river–sea intermodal transport, demonstrating that learning-based approaches can effectively manage coupled resources and time dependencies.whereas ref. [17] integrates timetable information into time-dependent shortest-path optimization. These studies demonstrate the ability of deep models to extract structural patterns, but they lack explicit mechanisms for real-time decision-making.

Reinforcement learning, by contrast, enables agents to learn adaptive routing strategies through interaction with the environment. Ref. [18] introduces a reward-guided conservative Q-learning approach to coordinate ride-pooling and public transport under uncertain demand. Ref. [19] applies multi-objective Q-learning to multimodal routing under time uncertainty, balancing travel time, cost, and reliability.

Comprehensive surveys by ref. [20]. Ref. [21] underscores RL’s potential in addressing traffic disturbances, real-time decision-making, and multimodal coordination. Ref. [22] shows the feasibility of real-time deep RL navigation, while ref. [23] demonstrates how Q-learning can dynamically regulate traffic systems.

Applications have also expanded to intermodal journey planning. Ref. [24] uses deep RL to coordinate multimodal routes under capacity constraints, and ref. [25] incorporates passenger behavior within RL-based journey planning frameworks. Ref. [26] integrates RL in coordinating ride-sourcing with public transport, highlighting RL’s suitability for environments with high variability and stochastic interactions.

Collectively, these works show that RL methods effectively handle disturbances, delays, travel-time uncertainty, dynamic demand, and complex multimodal dependencies shortcomings commonly observed in graph-based and heuristic strategies.

2.4. Synthesis and Research Gap

The literature demonstrates the following:

Graph-based models provide strong structural and temporal representation but struggle with scalability and uncertainty.
Heuristic and metaheuristic approaches offer scalability but may sacrifice optimality and robustness.
Reinforcement learning methods excel in adaptability and dynamic decision-making but often lack explicit structural modeling of transport networks.

Most existing works address only one aspect of the intermodal routing problem: either structural modeling, computational efficiency, or dynamic adaptability, without integrating all three. This gap motivates our approach, which combines a transfer-graph model for structural and computational scalability and a Deep Q-Network (DQN) for real-time adaptive routing, thus unifying these complementary capabilities within a single framework.

3. Materials and Methods

The proposed methodology models the time-dependent intermodal routing problem using a transfer-graph representation, chosen for its scalability and ability to capture heterogeneous modes with temporal variability. Prior works by refs. [4,8,27] have shown that transfer graphs reduce network dimensionality by decomposing multimodal systems into unimodal subgraphs linked through virtual transfer arcs. Additional studies such as ref. [6] support the use of structured graph abstractions for managing time-dependent constraints, providing a strong foundation for the modeling choices adopted here.

This structure also aligns with algorithmic and AI-based advances in multimodal routing. It facilitates intra-modal precomputation [8], integrates naturally with metaheuristic techniques such as memetic algorithms [10] and evolutionary methods [11], and provides a compact state space suitable for reinforcement learning approaches. Recent RL applications to multimodal transport refs. [18,19] highlight the benefits of such structured environments, further supported by surveys from refs. [20,21]. These combined insights justify the transfer graph as the core modeling framework for the proposed hybrid optimization approach.

The objective of this section is to define the mathematical structure of the intermodal network, describe the modeling assumptions, and clarify how each component contributes to the subsequent algorithmic stages. Notational consistency is maintained throughout.

3.1. Problem Formulation

The intermodal network is modeled as a directed, time-dependent graph G = (V, E, M, P). Vertices represent stations or parking vertices, and edges denote feasible, scheduled trips between them. Each transport mode forms a monomodal subgraph, connected by transfer arcs that enable modal changes. The objective is to find a path that minimizes the total travel cost. This formulation establishes the foundation for the time-dependent shortest path and transfer-graph model.

Let

G = (V, E, M, P)

be a directed graph representing intermodal transport, where

V_{j} = \{v_{j 1}, \dots, v_{j l}\}

denotes the set of vertices of mode j.

M = \{m_{1}, \dots, m_{k}\}

is the set of transportation modes in the city (e.g., train, tram, bus…).

E = \{e_{1}, \dots, e_{l}\}

is the set of arcs.

P = \{p_{1}, \dots, p_{s}\}

denotes the set of parking spaces in the city.

An arc

e_{i} = {(v_{j q}, v_{j r})}_{m_{j}} \in E

it means that it is possible to go from vertex

v_{j q}

to vertex

v_{j r}

using mode

m_{j}

.

A value

f_{r}^{e_{i}} (t_{k})

is associated with each arc and indicates the travel cost of arc

e_{i}

when departing at time

t_{k}

, according to user criterion r (distance, duration, etc.).

In the following, we define some concepts that will allow us to present our model and our approach to resolution.

3.1.1. Definition 1: Intermodal Path

Given an intermodal transport graph

G = (N, E, M, P)

, an intermodal path

p_{v_{i o}, v_{j d}}

is a sequence of arcs that allows travel from vertex

v_{i o}

to vertex

v_{j d}

:

p_{v_{i o}, v_{j d}} = ({(v_{i}, v_{2})}_{m_{o}}, {(v_{2}, v_{3})}_{m_{p}}, \dots, {(v_{j - 1}, v_{j})}_{m_{d}})

(1)

where

\forall i, j \in \{1, \dots, k\}, v_{i}, v_{j} \in V, {(v_{i}, v_{i + 1})}_{m_{i}} \in E, m_{i} \in M, i \neq j \Rightarrow v_{i} \neq v_{j}

(2)

3.1.2. Definition 2: Time-Dependent Intermodal Graph

We define

G (V, E, M, P, T)

as a time-dependent intermodal graph, where

V is the set of vertices;

E is the set of arcs;

M is the set of modes;

P is the set of parking facilities in the city.

Each arc

e_{i} \in E

is associated with a set of trips:

τ_{e_{i}} = \{(t_{s 1}, t_{a 1}), \dots, (t_{s k}, t_{a k})\}

(3)

such that

∣ τ_{e_{i}} ∣ = k \geq 1

and

t_{s j} \leq t_{a j}

for all

j \in \{1, \dots, k\}

.

The set of all trips is

T = ⋃_{e_{i} \in E} τ_{e_{i}}

(4)

3.1.3. Definition 3: The Cost Function

The vector-function

f_{r} (p, t_{0}) : E \times T \to R

(5)

represents the cost of path p starting at time

t_{0}

according to criterion r.

R is a set of vectors, each vector representing a criterion.

The cost function integrates three criteria—edge cost, travel time, and comfort to provide a comprehensive multicriteria evaluation of each path segment.

3.1.4. Definition 4: Shortest Path Problem

Consider a graph

G = (V, E, M, P, T)

, two

e, n \in V

, and a departure time

t_{0} \in \{t_{1}, t_{2}, \dots, t_{l}\}

.

The shortest path problem in time consists of computing a path p from e to n at instant

t_{0}

, such that

f (p, t_{0})

is minimal. This path is called the shortest path.

3.1.5. Definition 5: Transition Graph

Now consider a graph

G = (V, E, M, P, T)

, The transfer graph is defined as

T_{g} = (C, T r)

(6)

where

$C = \{C_{1}, C_{2}, \dots, C_{k}\}$ is the set of monomodal graphs.
$T r$ is the set of transition edges that interconnect them.

Each component

C_{i} = (V_{i}, E_{i}, M_{i}, P_{i})

is defined such that

\forall j \in \{1, \dots, k\}, i \neq j \Rightarrow M_{i} \neq M_{j}

(7)

The global sets are given by

V = \underset{i = 1}{⋃^{k}} V_{i}, E = \underset{i = 1}{⋃^{k}} E_{i}, M = \underset{i = 1}{⋃^{k}} M_{i}, P = \underset{i = 1}{⋃^{k}} P_{i}, T = \underset{i = 1}{⋃^{k}} T_{i}

(8)

A transfer arc

t r_{i} = (v_{x}, v_{y})

represents a transfer from mode

m_{x}

to mode

m_{y}

at vertex

v_{x}

(or

v_{y}

)

The vertices

v_{x}, v_{y} \in T_{V_{i}}

are called transfer arcs.

3.2. Transfer Graph Model

The transfer graph is a graphical abstraction that represents the time-dependent intermodal transport network in a modular and computationally efficient manner. As highlighted by ref. [8], its principal advantage lies in its ability to reflect the distributed nature of real-world transport information systems by separating each transport mode into an independent unimodal network. This separation allows each subsystem, such as rail, tram, bus, busway, or private vehicle networks, to be maintained, updated, or extended independently, without requiring a global precomputation of the entire network.

Beyond its representational clarity, the transfer-graph model provides several methodological benefits compared to classical layered or time-expanded graph formulations. Layered graphs replicate the network across discrete time intervals, leading to a rapid explosion in graph size and computational complexity. In contrast, the transfer graph maintains a compact structure by modeling each mode as a distinct component and linking these components through transfer arcs. These arcs encode modal transitions (e.g., train → tram, bus → metro, parking → busway) without duplicating the full temporal state space, thereby significantly reducing dimensionality [7,8,27].

In a transfer graph

T_{g}

, two types of paths can be distinguished:

Intra-component path: connects two vertices belonging to the same mode of transportation;
Inter-component path: connects two vertices belonging to different modes of transportation.

Figure 1 presents an example of a transfer graph, where the modes Train, Tramway, Busway, and Bus represent four public transport modes, and the Parking represents the road graph network.

Table 1 provides an illustrative example of the time-dependent data used in the transfer-graph model. For each transport mode, the table lists the directed edges, their departure–arrival time intervals, and the corresponding generalized cost. These two criteria, time and cost, are essential inputs of the model: time determines the temporal feasibility of each connection, while cost represents the multi-criteria evaluation used in the shortest-path computation (travel time, fare, and comfort). Parking transitions are also included as inter-modal arcs with associated costs, enabling the integration of park-and-ride strategies. This table, therefore, illustrates how the transfer graph incorporates intra-modal connections, transfer arcs, and time-dependent constraints, which are later used by the optimization algorithms to compute optimal intermodal routes.

All time intervals (e.g., “1 → 2”, “3 → 5”) represent scheduled departure and arrival times expressed in hours within the normalized time horizon used for the intermodal network simulation.

4. Proposed Approach

The proposed optimization framework details the multi-objective formulation and associated constraints, encompassing flow conservation, parking capacities, and mode consistency. It models the trade-offs between travel time, cost, and comfort through weighted criteria, providing a balanced representation of user preferences. The discussion also analyzes computational complexity, emphasizing the need for efficient solution methods that scale to large intermodal networks. This framework serves as the analytical core for the algorithmic strategies presented in the following section.

The objective function is a multi-objective function. Indeed, by varying the multimodal path between two vertices of the multimodal graph, various parameters (duration, cost, level of comfort) can vary [8,28].

Let

p_{n_{i o}, n_{j d}}

be an intermodal modal path connecting vertex

v_{i o}

to vertex

v_{j d}

.

The optimal path between a vertex i belonging to mode m and a vertex j belonging to mode n, with respect to a criterion k, is defined as

F_{k} = m i n \sum_{i, j, m, n} x_{i m j n} \cdot f_{k}^{t_{0}} (x_{i m j n})

(9)

where

x_{i m j n}

: binary variable indicating whether the arc

e_{i m j n}

, connecting vertex i (mode m) to vertex j (mode n), is used (x = 1) or not (x = 0).

f_{k}^{t_{0}} (x_{i m j n})

: cost function associated with arc

e_{i m j n}

for criterion k, starting at time

t_{0}

.

To simultaneously account for all criteria, we define a global objective function as a linear weighting of the optimal functions for each criterion:

F = m i n \sum_{k} w_{k} \cdot F_{k}

(10)

where

w_{k}

: weighting coefficient reflecting the importance of criterion k for the user (Normalized sum of weights).

4.1. Problem Constraints

The optimization is subject to the following constraints:

Flow conservation constraint:

\begin{matrix} \sum_{j, n} x_{i m j n} - \sum_{i, m} x_{i m j n} = \{\begin{matrix} - 1 & if j = o (origin vertex) \\ 0 & \forall j \in V, j \neq o, d \\ 1 & if i = d (destination vertex) \end{matrix} \end{matrix}

(11)

Ensures that the path starts at o, ends at d, and respects flow conservation.

Unique arc usage constraint:

\begin{matrix} \sum_{m, n} x_{i m j n} \leq 1, \forall (i, j) \in E, (m, n) \in M \end{matrix}

(12)

Ensures that each arc is used at most once for a given transport mode.

Consistency constraint for private modes:

\sum_{l} x_{l v_{i} v} \geq x_{i v j v}, \forall (i_{v}, j_{v}) \in M_{v}

(13)

If a user leaves their private vehicle in a parking facility, the remainder of the journey must not include returning to that vehicle [14].

Parking Capacity Constraint:

\sum_{u \in U} y_{u p} \leq {C a p}_{p}, \forall p \in P

(14)

The number of users parked in a parking facility, p, cannot exceed its maximum capacity

{C a p}_{p}

where

y_{u p} \in {0, 1} : \{\begin{matrix} 1, i f u s e r u p a r k s i n f a c i l i t y p, \\ 0, o t h e r w i s e \end{matrix}

(15)

4.2. Problem Complexity

The problem of determining an optimal path under multiple criteria in an intermodal transport network composed of v vertices and e edges is a multi-objective optimization problem whose complexity grows rapidly with network size and the number of criteria. Instead of a single cost value, multi-criteria algorithms must propagate and compare sets of non-dominated labels, leading to worst-case complexities on the order of

O (r \cdot v \cdot e^{2})

to

O (r \cdot v \cdot e^{3})

in dense networks, depending on the dominance-checking and label-propagation strategy [3].

In the literature, the main difficulty lies in maintaining acceptable performance as the network becomes larger, more heterogeneous, and more time-dependent conditions are commonly encountered in real multimodal systems [14].

5. Solution Approach

This section describes the algorithmic mechanisms developed to address the time-dependent intermodal routing problem, introducing the decomposition-based procedure that structures the overall computation into a sequence of coordinated processing stages. Four complementary algorithms, Dijkstra, Ant Colony Optimization, Depth-First Search, and Genetic Algorithm, are adapted to operate within the transfer-graph framework. Their theoretical underpinnings and pseudo-codes are provided to demonstrate how each method balances accuracy, exploration, and computational efficiency.

The proposed approach for computing the shortest path (SP) within this model is defined. The corresponding graph-based approach comprises three main stages.

The first stage, referred to as Precalculations, consists of computing and storing part of the computations in advance. Based on this information, together with additional calculations, a more compact structure, called the Relevant Graph, is constructed. Finally, the shortest path is computed from this new abstraction [8,29].

In this study, four different approaches were tested for computing pre-calculations in the context of the time-dependent transfer graph. Each approach is designed to compute shortest paths between transfer vertices within unimodal components, and the results are then reused to accelerate query-specific computations.

To accelerate the computation of shortest paths in time-dependent intermodal networks, four algorithmic strategies were implemented and compared: Dijkstra’s algorithm [11], Ant Colony Optimization (ACO), Depth-First Search (DFS), and a Genetic Algorithm (GA). Each approach was adapted to operate within the transfer graph framework and tailored for precalculation of intra-component paths, which are then reused for query resolution in the relevant graph.

5.1. Pre-Calculations with Dijkstra Algorithm

Dijkstra’s algorithm is a classical exact method for solving shortest path problems in weighted graphs. In our adaptation in Algorithm 1, the algorithm is executed from each transfer vertex and for all feasible departure times. The procedure initializes vertex costs to infinity, assigns the departure vertex as the source, and iteratively relaxes outgoing edges that are accessible at the current time. Costs are updated when a better arrival time is found, and the candidate edges are stored for further expansion. This ensures that the minimum travel cost to each reachable vertex is progressively identified.

Algorithm 1. Dijkstra Pseudo Code for Precalculation.
Input	Transfer graph (Tg)
Output	Precomputed shortest paths between transfer vertices
1	For each component (Ck in Tg) do
2	For each transfer vertex ∈ TransferVertex(Ck) do
3	For each departure time ∈ DepartTime(v) do
4	Initialize component C = (V, E, M, T, TV), source vertex and current time (t)
5	Set currentVertex ← s
6	Mark(currentVertex)
7	For each vertex in V do
8	cost[u] ← ∞
9	End for
10	Repeat
11	For each edge $\in E^{+} (c u r r e n t V e r t e x)$ do
12	If accessibleEdge(e, currentTime) then
13	vertex ← destinationVertex(e)
14	If isNotMarked(vertex) then
15	currentCost ← bestArrivalTime(e, currentTime)
16	If currentCost < cost[vertex] then
17	cost[vertex] ← currentCost
18	candidateEdges ← e
19	End if
20	End if
21	End if
22	End for
23	Until termination condition is satisfied
24	End for
25	End for
26	End for

5.2. Pre-Calculations with Ant Colony Optimization Algorithm

ACO is a bio-inspired metaheuristic based on the foraging behavior of ants Algorithm 2. In the precalculation phase, artificial ants iteratively explore the transfer graph from each transfer vertex, guided by pheromone trails and heuristic visibility [11]. Paths are classified, decomposed, and stored in the precalculated database. Pheromone levels are updated according to the quality of discovered solutions, reinforcing promising routes while discouraging inefficient ones. This iterative process continues until the maximum number of cycles is reached. The advantage of ACO is its ability to explore diverse path alternatives and adaptively converge toward efficient routes. However, its performance depends on parameter tuning (pheromone evaporation and exploration–exploitation balance) and may require more computation time than deterministic methods.

Algorithm 2. Ant Colony Optimization Pseudo Code for Precalculation.
Input	Transfer graph Tg
Output	Precalculation database DB
1	For each component Ck in Tg do
2	For each transfer vertex v in TransferNodes(C_k) do
3	Initialize precalculation database (DB)
4	Set cycle counter nbCycle ← 0
5	While (nbCycle < MaxCycle) do
6	P ← ExploreGraph(v, Ck)
7	ClassifyPaths(P)
8	DB ← DecomposeAndStore(P)
9	UpdatePheromones(Ck, P)
10	nbCycle ← nbCycle + 1
11	End while
12	End for
13	End for
14	Return (DB)

5.3. Pre-Calculations with Depth-First Search Algorithm

Depth-First Search (DFS) provides an exhaustive exploration strategy by recursively traversing all feasible paths from a source to a destination vertex Algorithm 3. The algorithm maintains the current path, cumulative cost, and visited vertices to prevent cycles [27]. When the destination is reached, the cost is compared with the best-known cost, and the best path is updated accordingly. By exploring all accessible paths under temporal constraints, DFS guarantees completeness and identifies the globally optimal solution.

Algorithm 3. Depth-first search Algorithm Pseudo Code for Precalculation.
Input	Intermodal component C = (V, E, M, T, TV), start vertex (s), destination vertex (d), initial time (t)
Output	bestPath, bestCost
1	Initialize bestCost ← + $\infty$ )
2	Initialize (bestPath ← ∅)
3	Define recursive procedure DFS((currentVertex), (currentTime), (currentCost), (visited), (path))
4	If (currentVertex = d) then
5	If (currentCost < bestCost) then
6	bestCost ← leftarrow currentCost
7	bestPath ← path
8	End if
9	Return
10	End if
11	For each edge e $\in E^{+} (c u r r e n t V e r t e x)$ do
12	If AccessibleEdge(e, currentTime)) and Destination((e)) $\notin$ visited then
13	nextVertex ← Destination(e)
14	nextCost ← currentCost + cost(e)
15	nextTime ← currentTime + time(e)
16	DFS(nextVertex, nextTime, nextCost, visited $\cup \{n e x t V e r t e x\}$ , path $\cup \{e\}$ )
17	End if
18	End for
19	End procedure
20	Call DFS(s,(t, 0, {s}, $\emptyset$ )
21	Return (bestPath, bestCost)

Although DFS is not typically used for time-dependent shortest-path problems due to its exponential search space and lack of pruning mechanisms, its inclusion in this study serves two methodological purposes. First, DFS provides an exhaustive baseline against which heuristic and deterministic algorithms can be compared, allowing us to validate the correctness of optimal solutions in small- and medium-scale instances. Second, DFS is particularly useful within the transfer-graph decomposition framework, where pre-calculations operate on reduced unimodal components; in these smaller subgraphs, DFS can enumerate all time-feasible paths, enabling a comprehensive evaluation of time-dependent feasibility constraints.

5.4. Pre-Calculations with Genetic Algorithm

The GA is a population-based evolutionary Algorithm 4 that mimics natural selection. The procedure begins with a population of random paths between a given source and destination [12]. Each path is evaluated using a fitness function based on the inverse of its travel cost. New generations are produced through crossover (combining subpaths of parent solutions) and mutation (randomly altering parts of a path). Feasibility checks ensure time-dependent constraints and accessibility rules are respected. The best individuals are retained through elitism, and the process repeats until convergence or a maximum number of generations is reached.

Algorithm 4. Genetic Algorithm Pseudo Code for Precalculation.
Input	Intermodal component C = (V, E, M, T, TV), start vertex (s), destination vertex (d), initial time (t)
Output	Best path (bestPath)
1	Initialize a population (P) with (N) random feasible paths from (s) to (d)
2	Evaluate the fitness of each path in P using (fitness(p) = 1/totalCost(p))
3	Repeat
4	Select parent paths from (P) using tournament or roulette-wheel selection
5	Apply crossover between selected parents to generate offspring paths
6	Apply mutation to offspring by randomly modifying a sub-path
7	Ensure feasibility of offspring (time windows, accessibility, and transfer constraints)
8	Evaluate the fitness of offspring
9	Create a new population (P’) using elitism and offspring selection
10	Set P ← P’
11	Until stopping criterion is met (maximum generations or convergence)
12	Select bestPath $\in$ P with minimum total cost
13	Return (bestPath)

6. Experimental Setup

This section presents the experimental design used to evaluate the proposed approach. To ensure reproducibility, fairness, and controlled variability across experiments, all evaluations were performed using repeated trials and deterministic randomness. For each tested network size (50, 100, 200, 500, 1000, 2000, and 4000 vertices), two independent runs were executed, and the reported metrics correspond to the mean execution time and mean peak memory consumption. Algorithms involving stochastic behavior, namely Ant Colony Optimization (ACO) and the Genetic Algorithm (GA), were executed using a fixed pseudorandom seed (seed = 42), ensuring that all methods operated on identical graph instances while preserving their internal exploration dynamics.

Synthetic time-dependent intermodal networks were generated through a unified and controlled procedure. Vertices were partitioned into five modal components, each forming a unimodal subgraph, and a fixed proportion of vertices was designated as transfer vertices following the transfer-graph formalism. For every directed connection, ten time-dependent travel options were generated; their departure times, durations, monetary costs, and comfort penalties were produced through calibrated functions enforcing realistic correlations between travel duration and generalized travel cost. This generation strategy yields synthetic networks whose connectivity, density, and temporal patterns are consistent with real metropolitan transport systems, including the multimodal characteristics observed in Casablanca.

To ensure a fair comparison, the parameter configurations of the stochastic algorithms were kept constant across all experiments. ACO used 8 ants over 8 cycles (α = 1.0, β = 2.0, ρ = 0.10), whereas the GA used a population of 16 individuals, 12 generations, a crossover rate of 0.8, and a mutation rate of 0.15. Dijkstra and DFS are deterministic and therefore contain no internal randomness; DFS was additionally constrained by bounded branching and depth limits proportional to the square root of the component size, ensuring tractability on large networks while maintaining exhaustive exploration within each unimodal component.

All algorithms were implemented in Python version 3.13, and the program simulations were executed in the PyCharm Version 2024.3.2 environment on a computer with the following configuration: Intel Core i5-7200U, 2.50 GHz, and 16 GB of RAM. The results are reported in Figure 2 and Figure 3.

Figure 2 presents the average execution times of the Dijkstra, ACO, DFS, and GAs as a function of intermodal network size ∣V∣. The results clearly show that Dijkstra and the Genetic Algorithm (GA) are the most efficient in terms of speed: their execution times remain comparatively low and grow steadily with increasing network size, highlighting their suitability for large-scale networks.

In contrast, Ant Colony Optimization (ACO) and Depth-First Search (DFS) demonstrate a pronounced increase in execution time as the network size expands beyond medium-scale instances. This trend reflects their substantial computational resource requirements, which may constrain their applicability in real-time or resource-limited environments. Nevertheless, the heuristic nature of ACO and the exploratory behavior of DFS can yield valuable problem-specific insights and localized optimization advantages under certain operational conditions.

Overall, Dijkstra’s algorithm and the Genetic Algorithm (GA) exhibit superior performance in terms of execution time, whereas Ant Colony Optimization (ACO) and Depth-First Search (DFS) remain viable alternatives in contexts where their respective methodological strengths can be effectively leveraged.

In Figure 3, the evolution of memory consumption across Dijkstra, Ant Colony Optimization (ACO), Depth-First Search (DFS), and the Genetic Algorithm (GA) is presented as a function of ∣V∣. A gradual increase in RAM usage is observed for all algorithms up to approximately 2000 vertices. Beyond this threshold, their behaviors diverge markedly. The Dijkstra algorithm maintains a steady upward trend, reaching the highest memory consumption level (>130 MB at 4000 vertices).

In contrast, the DFS and GAs exhibit stabilization beyond 2000 vertices, sustaining relatively constant memory requirements (≈85–90 MB). This pattern suggests a predictable and manageable resource footprint, even for larger-scale networks. The ACO algorithm displays a distinct behavior, with memory consumption peaking around 2000 vertices before declining at higher scales, rendering it the most memory-efficient approach for very large networks.

In summary, the Dijkstra algorithm is characterized by high memory intensity, whereas ACO demonstrates superior efficiency on large-scale graphs. The DFS and GAs occupy an intermediate position, offering stable, balanced memory usage alongside algorithmic flexibility, making them practical choices for applications requiring both consistency and adaptability in resource management.

Table 2 compares execution time and memory usage for Dijkstra, ACO, DFS, and GA across increasing network sizes. Dijkstra is the fastest method on all instances but shows strong memory growth, exceeding 130 MB at ∣V∣ = 4000, which limits its scalability. ACO provides diverse solution exploration but becomes computationally expensive as the network grows, with execution times rising from 3.2 s to over 2300 s. DFS exhibits the steepest increase in runtime, reaching more than 31,000 s at 4000 vertices, reflecting its exponential search behavior, although its memory use remains relatively stable. GA offers the best balance, maintaining moderate execution times and stable memory consumption (below 85 MB) even at the largest scale. Overall, the table confirms that each algorithm excels in different aspects: Dijkstra in speed, GA in scalability, ACO in exploration, and DFS only in small graphs.

As shown in Table 3, the synthetic networks considered in our experiments are comparable in size to real-world metropolitan public transport networks such as Casablanca, Berlin, Sydney, Paris, and Moscow. This demonstrates that the performance evaluation of Dijkstra, ACO, DFS, and GA reflects realistic operational conditions. The scalability achieved by the proposed approach is therefore representative of actual urban intermodal systems and suitable for deployment in networks of metropolitan scale.

7. Reinforcement Learning Integration in Transfer Graphs

This section presents an alternative approach to the precomputation-based methods by integrating a Deep Q-Network (DQN) agent with the transfer-graph model. The objective is to enable the agent to learn optimal routing policies through direct interaction with the intermodal network environment, thus eliminating the need for costly precalculations. The proposed reinforcement learning framework treats the transport network as a dynamic environment in which each vertex represents a potential decision point, and each arc corresponds to a time-dependent transition. The agent observes the current state defined by location, time, and available connections and selects actions that minimize a generalized travel cost based on time, fare, and transfer penalties. Through iterative training, the DQN progressively improves its policy to produce near-optimal intermodal routes.

7.1. Comparative Evaluation and Selection of Reinforcement Learning Algorithms for Intermodal Routing

The choice of the Deep Q-Network (DQN) algorithm for integrating reinforcement learning within the transfer-graph framework is motivated by both theoretical suitability and practical efficiency. DQN combines the discrete-state decision structure of classical Q-learning with the function-approximation capacity of deep neural networks, allowing it to handle the large, partially continuous state spaces typical of time-dependent intermodal transport networks. Unlike tabular Q-learning or SARSA, which require explicit enumeration of every state–action pair, DQN generalizes across unseen network states, thereby enabling scalable learning despite the combinatorial growth in the number of vertices, modes, and departure times [20,21].

From a modeling perspective, the transfer graph defines discrete, high-dimensional decision points, where each vertex–time pair represents a possible agent state, and each intermodal transfer defines an action. DQN’s value-function approximation is therefore particularly suited to capturing nonlinear relationships between spatial, temporal, and modal attributes without relying on handcrafted features [22]. Furthermore, DQN incorporates experience replay and a target network to stabilize convergence features that enhance learning robustness in dynamic or stochastic environments such as passenger networks subject to delays, schedule uncertainty, and congestion [20,23].

Other reinforcement learning algorithms were evaluated but found less appropriate for this context:

Policy-gradient methods (e.g., REINFORCE, PPO, A3C) exhibit strong performance in continuous-control domains but demand high sample complexity and sensitive hyperparameter tuning characteristics unsuitable for the sparse, discrete action space of intermodal routing [24].

Actor–critic architectures improve stability yet increase computational overhead and training time without significant performance gains when actions correspond to a limited set of feasible transfers [21].

Model-based RL can leverage known transition dynamics but requires accurate modeling of time-dependent stochastic processes; DQN, being model-free, learns directly from simulated experience within the transfer-graph environment [20].

Empirical evidence from recent studies confirms DQN’s advantages for transport decision-making. Ref. [22] demonstrated that a DQN-based routing agent outperformed classical algorithms such as Dijkstra and Ant Colony Optimization, reducing travel time by over 20% in dynamic urban networks. Ref. [25] applied a DRL framework to passenger behavior optimization in multimodal journey planning, showing significant gains in user satisfaction and operator efficiency. Similarly, ref. [24] used deep RL to coordinate intermodal journeys under capacity constraints, validating the robustness of value-based approaches for intermodal passenger systems.

Consequently, the DQN provides an optimal balance between learning efficiency, scalability, and computational tractability for time-dependent intermodal routing. Its integration within the transfer-graph model enables the system to approximate optimal routing policies with minimal pre-calculation, achieving real-time adaptability in complex multimodal passenger networks [20,21,25].

7.2. Methodology

Figure 4 shows the interaction loop between the reinforcement learning agent and the transfer graph environment. The environment encodes the intermodal transport network. Vertices represent stations or facilities. Arcs represent time-dependent connections between modes such as train, tram, bus, busway, and parking. At each step, the agent observes the state

s_{t}

. This state contains the current vertex, the departure time, the destination, and the last mode used. From this, the policy π selects an action at. The action may be to travel along a valid edge or to wait for the next departure. The action is applied to the environment, which produces the next state

s_{t + 1}

and a reward

r_{t}

. The reward is the negative generalized travel cost, defined as time plus fare plus transfer penalties, with a bonus when the destination is reached. Over episodes, the agent updates the value function Q(s,a). The policy improves and converges to near-optimal routing.

7.2.1. Environment

The environment is the intermodal transfer graph G = (V, E, M, P).

State at time t:

s_{t}

= Current vertex

v_{t}

, time t, destination

s_{d e s t}

)

(The previous mode is no longer required since the transfer cost is ignored).

Transition: Follows the schedule of the chosen edge or the WAIT action.

Termination: When the destination vertex is reached or when the time horizon is exceeded.

7.2.2. Agent

The agent is a DQN.

Input: state vector

s_{t}

.

Output: Q-values Q(

s_{t}

,a) for each valid action.

The agent follows an ε-greedy policy:

a_{t} = \{\begin{matrix} \arg {m a x}_{a} Q (s_{t}, a) & with probability 1 - ε - greedy \\ random valid action & with probability ε - greedy \end{matrix}

(16)

ε decays over training, shifting from exploration to exploitation.

7.2.3. Actions

Travel along a feasible edge (train, tram, bus, busway, parking).

WAIT until the next departure.

7.2.4. Rewards

The reward function is

\begin{matrix} R_{t} & = w_{t i m e} \cdot Δ t + w_{f a r e} \cdot c o s t & + w_{t r a n s f e r} \cdot n_{t r a n s f e r} \end{matrix}

(17)

where

w_{t i m e}

,

w_{f a r e}

and

w_{t r a n s f e r}

are user-defined weights.

A positive bonus of +50 is granted when the destination vertex is reached to accelerate convergence toward feasible optimal paths.

7.2.5. Training

The DQN minimizes the Bellman error:

\begin{matrix} L & = & {(Q (s, a) - (r + γ . m a x Q_{t a r g e t} (s^{'}, a^{'})))}^{2} \end{matrix}

(18)

The DQN minimizes the Bellman error [31], which measures the difference between the network’s current prediction

Q (s_{t}, a)

and the target value. Here,

Q (s_{t}, a)

is the estimated value of taking action

a

in the state

s

, while the target is given by

r + γ . m a x Q_{t a r g e t} (s^{'}, a^{'})

.

The first term r is the immediate reward obtained after taking action

a

, and the second term

γ . m a x Q_{t a r g e t} (s^{'}, a^{'})

represents the best possible future return estimated by the target network, scaled by the discount factor γ (0 < γ < 10).

Together, this target expresses what the return should be if the agent were acting optimally. The difference between prediction and target is the Bellman error, and squaring it penalizes larger deviations more strongly. By minimizing this squared error, the network gradually aligns its Q-values with the optimal action values.

7.3. Case Study

A DQN is integrated into the transfer-graph model to enable adaptive, learning-based routing within the time-dependent intermodal network.

7.3.1. Deep Q-Network (DQN) Routing Component

To ensure reproducibility, this section provides the technical details of the DQN agent integrated into the transfer-graph environment. The agent interacts with the time-dependent intermodal network through a structured state representation composed of: (i) the current vertex, (ii) the destination vertex, (iii) the previous transport mode, (iv) normalized current time, and (v) a bias term.

7.3.2. Network Architecture

The Q-function is approximated using a Dueling Double DQN implemented in PyTorch Version 2.9.1.

Input layer: (2 × |V| + |M| + 2) features.
Hidden layers: Two fully connected layers with 256 ReLU units each.
Output layer: Q-values for all feasible actions (including a WAIT action).

7.3.3. Training Configuration

Training follows an ε-greedy exploration policy with exponential decay. The network is optimized using the Adam algorithm, and the Huber loss is adopted to improve robustness to outliers in the temporal-difference updates. A discount factor γ = 0.99 is used, and all gradients are clipped at a maximum norm of 1.0 to ensure stable learning. All stochastic components were executed using a fixed random seed to guarantee reproducibility.

7.3.4. Episode Configuration and Convergence Monitoring

The agent was trained for 30,000 gradient steps using an episode horizon of 200 time steps to prevent unbounded exploration. Convergence was monitored through three indicators computed as moving averages:

Cumulative episode reward;
Success rate (agent reaches destination);
Episode length (routing efficiency).

All metrics stabilized after approximately 20,000 steps, indicating convergence.

As shown in Figure 5, the curve illustrates the frequency with which the agent successfully reaches the destination vertex across training episodes. At the onset of training, a low and unstable success rate is observed, attributable to the random exploration phase. As training progresses, the curve increases steadily, indicating that the agent progressively learns feasible and effective actions. The observed fluctuations are expected and reflect the dynamic balance between exploration, the testing of new paths, and exploitation, the use of previously learned strategies. Overall, the upward trend demonstrates that the reinforcement learning agent is gradually converging toward a stable and effective routing policy.

Figure 6 illustrates the evolution of path length, expressed in steps, throughout the training process. In the early stages, the agent tends to follow longer, less efficient routes, resulting in considerable variability in path length. As learning advances, shorter and more direct trajectories are increasingly selected, reflecting the agent’s growing ability to approximate the optimal solution. Occasional spikes in path length are observed, primarily as a consequence of continued exploration or unsuccessful trials; however, these variations gradually diminish as the policy stabilizes. Overall, the downward and stabilizing trend indicates that the agent not only learns to reach the destination reliably but also refines its strategy to minimize the number of steps required.

After convergence, the agent was able to compute the shortest path between the origin vertex e and the destination vertex n. The trained agent outputs the optimal sequence of vertices and corresponding transfer edges, and the total travel time and associated fare. For the considered query (e → n), the agent identified the path e → f → c → k → o → n, as illustrated in Figure 7, which corresponds to the optimal intermodal route under the defined cost function. This result highlights the ability of the DQN to exploit the structural properties of the transfer graph and to produce detailed trip information, including departure and arrival times at each step of the journey.

8. Discussion

The introduction outlined four central challenges: (i) constructing an extended transfer-graph model that incorporates time-dependent schedules, multi-criteria costs, intermodal transfers, and parking access; (ii) developing a decomposition strategy that exploits the transfer-graph structure to improve computational efficiency; (iii) establishing a unified benchmarking environment for evaluating Dijkstra, ACO, DFS, and GA under identical time-dependent intermodal conditions and varying network scales; and (iv) designing an AI-enhanced routing framework that integrates a DQN with the transfer-graph model to enable adaptive routing under schedule variability. The results presented in this study provide a basis for evaluating progress toward these objectives.

8.1. Effectiveness of the Transfer-Graph Representation

The first objective concerned the need for a scalable, structurally coherent model of time-dependent intermodal networks. The transfer-graph formulation successfully met this requirement. By decomposing the global network into unimodal components linked through explicit transfer arcs, the model mitigates one of the primary weaknesses of time-expanded and layered graphs, namely, exponential growth in graph size when representing time-dependent states. This aligns with prior findings in multimodal modeling [4,8], and the present results confirm that the transfer graph provides an efficient basis for integrating schedules, modal interactions, and multi-criteria costs.

8.2. Comparative Algorithmic Performance and Interpretation

The second and third objectives concerned whether classical and heuristic algorithms could be systematically evaluated within the transfer-graph framework and what their performance characteristics imply. As shown in Table 2, each method exhibits distinct behavior as network size increases. Dijkstra’s algorithm delivered optimal and predictable execution times on small and medium networks, but its memory consumption grew sharply beyond 130 MB at ∣V∣ = 4000, reflecting the scalability limitations noted in urban network studies such as [3], this is expected because Dijkstra maintains explicit frontier and distance-state structures that expand with graph size, reducing its suitability for real-time precomputation in large ITS environments. Genetic Algorithms (GA) provided the most balanced performance, maintaining competitive execution times and stable memory usage below 85 MB even on the largest instances, consistent with prior observations on the scalability of evolutionary metaheuristics in multimodal routing. Ant Colony Optimization (ACO) demonstrated strong exploratory behavior but incurred substantially higher computation times. ACO prioritizes solution diversity over runtime efficiency. Depth-First Search (DFS), although exhaustive on small networks and rapidly infeasible as the graph size increased due to its exponential search growth, showed notably low memory consumption across all tested instances, confirming its theoretical advantage in space efficiency. Collectively, these results confirm that no single algorithm is superior across all criteria: Dijkstra excels in accuracy, GA in scalability, ACO in solution diversity, and DFS in completeness.

8.3. Integration and Interpretation of Reinforcement Learning

The fourth objective concerned the potential for adaptive routing. As an alternative framework, a DQN was integrated into the transfer-graph environment to explore the capabilities of learning-based routing. The agent was able to learn routing strategies that approximate optimal paths while maintaining negligible inference cost once trained. It also demonstrated adaptability to schedule variations, in line with recent findings in reinforcement learning for multimodal routing [18,19]. Although evaluated on medium-sized networks, the DQN showed the capacity to generalize across unseen states and react to disturbances, headway fluctuations, or modal delays. Thus, the DQN component represents a promising framework for learning-based intermodal routing.

8.4. Limitations and Novel Contributions

While the proposed approach successfully addresses the four objectives outlined in the introduction, several limitations must be acknowledged. The synthetic networks used in the experiments approximate real-world multimodal structures but do not fully capture operational variability such as stochastic delays or congestion dynamics. The DQN component was evaluated only on medium-scale instances and has not yet been tested on full metropolitan networks, which limits claims regarding its scalability. Despite these constraints, the study introduces: a unified transfer-graph architecture tailored to multi-criteria, time-dependent intermodal routing; a benchmarking environment that enables rigorous comparison of deterministic, heuristic, and evolutionary algorithms on networks comparable to real urban systems; and the integration of a reinforcement learning agent equipped with action masking and temporal state encoding within a transfer-graph structure, a combination not previously explored in intermodal routing research.

9. Conclusions and Perspectives

This study introduced a transfer-graph framework for modeling time-dependent intermodal transport, enabling the unified representation of heterogeneous modes, scheduled operations, and transfer constraints within a scalable structure. Within this framework, classical and heuristic algorithms were systematically evaluated to analyze their computational behavior as the network size increased. The results confirm that Dijkstra’s algorithm remains highly effective for small to medium-scale instances, while the Genetic Algorithm (GA) exhibits more stable scalability on larger networks. Ant Colony Optimization (ACO) and Depth-First Search (DFS) provide complementary exploratory capabilities but incur substantially higher computational costs, particularly in dense or highly time-dependent settings.

The experiments were conducted on synthetic networks designed to reproduce key structural properties of real metropolitan systems, which allows controlled scalability analysis and fair algorithmic comparison but does not fully capture real-world operational variability such as stochastic delays, demand fluctuations, and data uncertainty. Within this context, the Deep Q-Network (DQN) component was intentionally positioned as a proof-of-concept rather than a direct competitor to exact or meta-heuristic algorithms. The DQN successfully recovered near-optimal paths and demonstrated rapid adaptation to temporal changes; however, its evaluation was limited to medium-scale synthetic settings. Unlike the baseline algorithms, which were rigorously assessed across increasing network sizes, the DQN results primarily illustrate feasibility and potential advantages in adaptability and inference speed. Extending this evaluation to large-scale, real-world networks remains an important direction for future research.

Future research will extend the proposed framework to larger and more dynamic networks, incorporate full multi-objective cost structures, and explore multi-agent and distributed artificial intelligence techniques to better capture network-wide interactions and passenger flows. Another important direction involves integrating the model into an operational information system supported by a scalable architecture capable of handling high-frequency queries and real-time updates. Leveraging real-time data APIs, such as live schedules, traffic conditions, and service disruptions, will further enhance the approach’s adaptability and practical deployment.

Author Contributions

Conceptualization: K.A., M.E.M. and W.D.; Methodology, K.A., M.E.M., H.G. and W.D.; Software: K.A.; Validation: Y.Z., H.G., M.E.M. and W.D.; Formal analysis: W.D. and H.M.; Investigation: K.A. and H.M.; Resources: W.D., H.M. and M.E.M.; Data curation: K.A. and W.D.; Writing—original draft preparation: K.A. and M.E.M.; Writing—review and editing: K.A., H.G., Y.Z., W.D. and M.E.M.; Visualization: H.G. and KA.; Supervision: H.G., W.D. and M.E.M.; Project administration: H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ACO	Ant Colony Optimization
DFS	Depth First Search
GA	Genetic Algorithm
RL	Reinforcement Learning
MDP	Markov decision process
DQN	Deep Q-Network
GHz	Gigahertz
RAM	Random Access Memory
CPU	Central Processing Unit
TV	Transfer Vertex

References

Gronalt, M.; Schultze, R.C.; Posset, M. Intermodal Transport-Basics, Structure, and Planning Approaches. In Sustainable Transportation and Smart Logistics: Decision-Making Models and Solutions; Elsevier: Amsterdam, The Netherlands, 2018; pp. 123–149. [Google Scholar] [CrossRef]
Zhang, J.; Liao, F.; Arentze, T.; Timmermans, H. A multimodal transport network model for advanced traveler information systems. Procedia Soc. Behav. Sci. 2011, 20, 313–322. [Google Scholar] [CrossRef]
Farahani, R.Z.; Miandoabchi, E.; Szeto, W.Y.; Rashidi, H. A review of urban transportation network design problems. Eur. J. Oper. Res. 2013, 229, 281–302. [Google Scholar] [CrossRef]
Mishina, M.; Khrulkov, A.; Solovieva, V.; Tupikina, L.; Mityagin, S. Method of intermodal accessibility graph construction. In Procedia Computer Science; Elsevier B.V.: Amsterdam, The Netherlands, 2022; pp. 42–50. [Google Scholar] [CrossRef]
Idri, A.; Oukarfi, M.; Boulmakoul, A.; Zeitouni, K.; Masri, A. A new time-dependent shortest path algorithm for multimodal transportation network. In Procedia Computer Science; Elsevier B.V.: Amsterdam, The Netherlands, 2017; pp. 692–697. [Google Scholar] [CrossRef]
Ensor, A.; Lillo, F. Colored-Edge Graph Approach for the Modeling of Multimodal Transportation Systems. Asia Pacific J. Oper. Res. 2016, 33, 1650005. [Google Scholar] [CrossRef]
Ayed, H.; Khadraoui, D.; Habbas, Z.; Bouvry, P.; Merche, J.F. Transfer Graph Approach for Multimodal Transport Problems. In Proceedings of the International Conference on Modelling, Computation and Optimization in Information Systems and Management Sciences, Metz, France, 8–10 September 2008; Springer: Berlin/Heidelberg, Germany, 2008. Available online: https://link.springer.com/chapter/10.1007/978-3-540-87477-5_57 (accessed on 13 September 2025).
Ayed, H.; Galvez-Fernandez, C.; Habbas, Z.; Khadraoui, D. Solving time-dependent multimodal transport problems using a transfer graph model. Comput. Ind. Eng. 2011, 61, 391–401. [Google Scholar] [CrossRef]
Yang, R.; Li, D.; Han, B.; Zhou, W.; Yu, Y.; Li, Y.; Zhao, P. Door to door space-time path planning of intercity multimodal transport network using improved ripple-spreading algorithm. Comput. Ind. Eng. 2024, 189, 109996. [Google Scholar] [CrossRef]
Dib, O.; Manier, M.A.; Caminada, A. Memetic algorithm for computing shortest paths in multimodal transportation networks. In Transportation Research Procedia; Elsevier: Amsterdam, The Netherlands, 2015; pp. 745–755. [Google Scholar] [CrossRef]
Chau, M.L.Y.; Gkiotsalitis, K. A Systematic Literature Review on the Use of Metaheuristics for the Optimisation of Multimodal Transportation; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2025. [Google Scholar] [CrossRef]
Liu, S.; Peng, Y.; Song, Q.; Zhong, Y. The robust shortest path problem for multimodal transportation considering timetable with interval data. Syst. Sci. Control Eng. 2018, 6, 68–78. [Google Scholar] [CrossRef]
López, D.; Lozano, A. Techniques in multimodal shortest path in public transport systems. In Transportation Research Procedia; Elsevier: Amsterdam, The Netherlands, 2014; pp. 886–894. [Google Scholar] [CrossRef]
Chen, X.; Kim, I. Modelling Rail-Based Park and Ride with Environmental Constraints in a Multimodal Transport Network. J. Adv. Transp. 2018, 2018, 2310905. [Google Scholar] [CrossRef]
Hao, L.; Jin, J.G.; Zhao, K. Joint scheduling of barges and tugboats for river–sea intermodal transport. Transp. Res. Part E Logist. Transp. Rev. 2023, 173, 103097. [Google Scholar] [CrossRef]
Wang, P.; Qin, J.; Li, J.; Wu, M.; Zhou, S.; Feng, L. Optimal Transshipment Route Planning Method Based on Deep Learning for Multimodal Transport Scenarios. Electronics 2023, 12, 417. [Google Scholar] [CrossRef]
Peng, Y.; Ma, A.; Yu, D.Z.; Zhao, T.; Xiang, C. Time-Dependent Shortest Path Optimization in Urban Multimodal Transportation Networks with Integrated Timetables. Vehicles 2025, 7, 43. [Google Scholar] [CrossRef]
Hu, Y.; Dong, T.; Li, S. Coordinating ride-pooling with public transit using Reward-Guided Conservative Q-Learning: An offline training and online fine-tuning reinforcement learning framework. Transp. Res. Part C Emerg. Technol. 2025, 174, 105051. [Google Scholar] [CrossRef]
Zhang, T.; Cheng, J.; Zou, Y. Multimodal transportation routing optimization based on multi-objective Q-learning under time uncertainty. Complex Intell. Syst. 2024, 10, 3133–3152. [Google Scholar] [CrossRef]
Lai, X.; Yang, Z.; Xie, J.; Liu, Y. Reinforcement Learning in Transportation Research: Frontiers and Future Directions; Elsevier B.V.: Amsterdam, The Netherlands, 2024. [Google Scholar] [CrossRef]
Farazi, N.P.; Zou, B.; Ahamed, T.; Barua, L. Deep Reinforcement Learning in Transportation Research: A Review; Elsevier Ltd.: Amsterdam, The Netherlands, 2021. [Google Scholar] [CrossRef]
Koh, S.; Zhou, B.; Fang, H.; Yang, P.; Yang, Z.; Yang, Q.; Guan, L.; Ji, Z. Real-time deep reinforcement learning based vehicle navigation. Appl. Soft Comput. 2020, 96, 106694. [Google Scholar] [CrossRef]
Rezzai, M.; Dachry, W.; Moutaouakkil, F.; Medromi, H. Reinforcement learning for traffic control system: Study of Exploration methods using Q-learning. Int. Res. J. Eng. Technol. 2008, 9001, 1838. Available online: www.irjet.net (accessed on 16 September 2025).
Codeca, L.; Cahill, V. Using Deep Reinforcement Learning to Coordinate Multi-Modal Journey Planning with Limited Transportation Capacity. Sumo Conf. Proc. 2022, 2, 13–32. [Google Scholar] [CrossRef]
Chu, K.F.; Guo, W. Deep reinforcement learning of passenger behavior in multimodal journey planning with proportional fairness. Neural. Comput. Appl. 2023, 35, 20221–20240. [Google Scholar] [CrossRef]
Feng, S.; Duan, P.; Ke, J.; Yang, H. Coordinating ride-sourcing and public transport services with a reinforcement learning approach. Transp. Res. Part C Emerg. Technol. 2022, 138, 103611. [Google Scholar] [CrossRef]
El Moufid, M.; Nadir, Y.; Boukhdir, K.; Benhadou, S.; Medromi, H. A Distributed Approach based on Transition Graph for Resolving Multimodal Urban Transportation Problem. 2019. Available online: www.ijacsa.thesai.org (accessed on 10 September 2025).
Liu, L.; Mu, H.; Yang, J. Toward algorithms for multi-modal shortest path problem and their extension in urban transit network. J. Intell. Manuf. 2017, 28, 767–781. [Google Scholar] [CrossRef]
Dib, O.; Manier, M.A.; Moalic, L.; Caminada, A. A multimodal transport network model and efficient algorithms for building advanced traveler information systems. In Transportation Research Procedia; Elsevier B.V.: Amsterdam, The Netherlands, 2017; pp. 134–143. [Google Scholar] [CrossRef]
von Ferber, C.; Holovatch, T.; Holovatch, Y.; Palchykov, V. Modeling Metropolis Public Transport. September 2007. Available online: http://arxiv.org/abs/0709.3203 (accessed on 15 November 2025).
Fujimoto, S.; Meger, D.; Precup, D.; Nachum, O.; Gu, S.S. Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error. arXiv 2022, arXiv:2201.12417. [Google Scholar] [CrossRef]

Figure 1. Transfer graph illustrative case.

Figure 2. Comparison of approaches in terms of execution time of CPU.

Figure 3. Comparison of approaches in terms of RAM execution.

Figure 4. Reinforcement Learning Interaction with the Intermodal Transfer Graph Environment.

Figure 5. Agent Success Rate over episodes.

Figure 6. Path Efficiency Over Time.

Figure 7. Optimal Path Obtained Using a DQN Model with Transfer-Graph Integration.

Table 1. Example of time-dependence for the transfer graph.

Train			Tramway			Bus			Busway			Parking
Edges	Time	Cost	Edges	Time	Cost	Edges	Time	Cost	Edges	Temps	Cost	Arcs	Cost
a → b	1 → 2	3	a → h	1 → 3	2	c → k	5 → 6	4	c → o	2 → 3	2	P₁ → P₂	5
a → d	2 → 4	2	b → g	4 → 5	4	c → l	2 → 4	2	c → k	4 → 6	3	P₁ → P₃	2
a → f	3 → 5	4	c → g	1 → 3 4 → 6	5 4	c → m	1 → 3	6	j → n	3 → 5 4 → 6	2 3	P₂ → P₁	6
b → a	10 → 12 12 → 13	4 4	c → i	2 → 4 4 → 6	2 2	j → l	6 → 7 2 → 5	4 4	k → c	2 → 5 5 → 7	2 3	P₃ → P₂	3
b → c	2 → 4 3 → 5	3 3	g → b	3 → 6 6 → 8	3 3	j → k	3 → 5 8 → 9	1 2	k → o	8 → 9 9 → 10	1 2	P₂ → P₃	2
c → b	9 → 10 10 → 12	5 3	g → c	2 → 4	3	k → c	4 → 6 5 → 8	1 3	n → j	2 → 5 5 → 7	3 2	P₃ → P₁	4
c → f	4 → 6 5 → 6	4 4	g → h	2 → 4 5 → 6	4 5	k → j	4 → 6 5 → 6	3 4	n → o	1 → 3 4 → 7	3 2
f → a	6 → 7 6 → 8	4 2	h → a	3 → 4 6 → 7	1 2	l → c	7 → 9 5 → 8	4 3	o → c	3 → 5 8 → 9	2 4
f → e	4 → 6	3	h → g	2 → 4 3 → 5	3 3	l → j	1 → 2 2 → 4	3 2	o → k	5 → 7 6 → 7	3 4
f → c	3 → 4 7 → 9	2 3	h → i	4 → 5 3 → 4	2 3	l → m	6 → 8 5 → 8	2 2	o → n	3 → 5 9 → 10	3 3
e → f	2 → 3 5 → 6	2 2	i → c	2 → 4 4 → 5	2 2	m → c	3 → 4 2 → 3	3 3	i → k	7 → 8 8 → 9	2 3
e → d	3 → 7	3	i → h	6 → 7	3	m → l	5 → 9 4 → 7	1 2
d → e	4 → 5 2 → 3	2 3
d → a	6 → 8	4

Table 2. Algorithmic Performance by Network Size: Execution Time and Memory Usage.

V	Dijkstra Time (s)	ACO Time (s)	DFS Time (s)	GA Time (Mb)	Dijkstra Memory (Mb)	ACO Memory (Mb)	DFS Memory (Mb)	GA Memory (Mb)
50	0.072	3.2	6.46	1.39	58.04	58.09	58.11	58.12
100	0.164	5.32	13.51	2.36	59.04	59.15	59.16	59.17
200	1.056	16.75	110.269	5.20	60.59	60.74	60.81	60.82
500	4.018	58.23	298.51	7.13	64.94	65.33	64.87	65.12
1000	10.97	125.13	853.38	13.73	73.31	74.31	71.46	71.57
2000	40.76	467.14	2697.25	28.179	90.56	91.01	69.99	69.8
4000	148.57	2374.35	31,474.67	72.618	131.53	127.6	71.33	83.54

Table 3. Comparison of real-world intermodal public transport networks and the equivalent instances used in experiments [30].

Real Transport Networks	$Vertex \|V\|$	Description	$Tested Equivalent Instance \|V\|$
Berlin	2996	Medium-size multimodal PTN (U-Bahn, S-Bahn, tram, bus)	3000
Düsseldorf	1544	Compact multimodal system with tram, metro, and bus services	2000
Moscow	3755	High-frequency metro + bus/tram layers	4000
Sydney	2034	Multimodal (train, tram, ferry, bus)	2000
Casablanca	1600	Intermodal network integrating tram, busway, bus and parking	2000
Istanbul	4043	Rapidly growing multimodal system	2000
Paris	4003	Dense multimodal system with extensive transfers	4000
Min	1544	Smallest representative metropolitan PTN	2000
Max	4043	Largest European-scale PTN in this comparison	4000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Share and Cite

MDPI and ACS Style

Anbri, K.; El Moufid, M.; Zahidi, Y.; Dachry, W.; Gziri, H.; Medromi, H. An Artificial Intelligence Enhanced Transfer Graph Framework for Time-Dependent Intermodal Transport Optimization. Appl. Syst. Innov. 2026, 9, 10. https://doi.org/10.3390/asi9010010

AMA Style

Anbri K, El Moufid M, Zahidi Y, Dachry W, Gziri H, Medromi H. An Artificial Intelligence Enhanced Transfer Graph Framework for Time-Dependent Intermodal Transport Optimization. Applied System Innovation. 2026; 9(1):10. https://doi.org/10.3390/asi9010010

Chicago/Turabian Style

Anbri, Khalid, Mohamed El Moufid, Yassine Zahidi, Wafaa Dachry, Hassan Gziri, and Hicham Medromi. 2026. "An Artificial Intelligence Enhanced Transfer Graph Framework for Time-Dependent Intermodal Transport Optimization" Applied System Innovation 9, no. 1: 10. https://doi.org/10.3390/asi9010010

APA Style

Anbri, K., El Moufid, M., Zahidi, Y., Dachry, W., Gziri, H., & Medromi, H. (2026). An Artificial Intelligence Enhanced Transfer Graph Framework for Time-Dependent Intermodal Transport Optimization. Applied System Innovation, 9(1), 10. https://doi.org/10.3390/asi9010010

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

An Artificial Intelligence Enhanced Transfer Graph Framework for Time-Dependent Intermodal Transport Optimization

Abstract

1. Introduction

2. Literature Review

2.1. Graph-Based Modeling of Intermodal Transport

2.2. Heuristics and Metaheuristics for Intermodal Routing

2.3. Deep Learning and Reinforcement Learning for Dynamic Intermodal Routing

2.4. Synthesis and Research Gap

3. Materials and Methods

3.1. Problem Formulation

3.1.1. Definition 1: Intermodal Path

3.1.2. Definition 2: Time-Dependent Intermodal Graph

3.1.3. Definition 3: The Cost Function

3.1.4. Definition 4: Shortest Path Problem

3.1.5. Definition 5: Transition Graph

3.2. Transfer Graph Model

4. Proposed Approach

4.1. Problem Constraints

4.2. Problem Complexity

5. Solution Approach

5.1. Pre-Calculations with Dijkstra Algorithm

5.2. Pre-Calculations with Ant Colony Optimization Algorithm

5.3. Pre-Calculations with Depth-First Search Algorithm

5.4. Pre-Calculations with Genetic Algorithm

6. Experimental Setup

7. Reinforcement Learning Integration in Transfer Graphs

7.1. Comparative Evaluation and Selection of Reinforcement Learning Algorithms for Intermodal Routing

7.2. Methodology

7.2.1. Environment

7.2.2. Agent

7.2.3. Actions

7.2.4. Rewards

7.2.5. Training

7.3. Case Study

7.3.1. Deep Q-Network (DQN) Routing Component

7.3.2. Network Architecture

7.3.3. Training Configuration

7.3.4. Episode Configuration and Convergence Monitoring

8. Discussion

8.1. Effectiveness of the Transfer-Graph Representation

8.2. Comparative Algorithmic Performance and Interpretation

8.3. Integration and Interpretation of Reinforcement Learning

8.4. Limitations and Novel Contributions

9. Conclusions and Perspectives

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI