Operator Learning with Branch–Trunk Factorization for Macroscopic Short-Term Speed Forecasting

Yu, Bin; Chen, Yong; Luo, Dawei; Bae, Joonsoo

doi:10.3390/data10120207

Open AccessArticle

Operator Learning with Branch–Trunk Factorization for Macroscopic Short-Term Speed Forecasting

¹

School of Economics and Management, Changzhou Vocational Institute of Mechatronic Technology, Changzhou 213164, China

²

School of Digital Economy, Changzhou College of Information Technology, Changzhou 213164, China

³

Department of Industry & Information Systems Engineering, Jeonbuk National University, 567, Baekje-daero, Deokjin-gu, Jeonju 54896, Republic of Korea

^*

Author to whom correspondence should be addressed.

Data 2025, 10(12), 207; https://doi.org/10.3390/data10120207

Submission received: 11 October 2025 / Revised: 10 December 2025 / Accepted: 11 December 2025 / Published: 12 December 2025

(This article belongs to the Topic Advanced Techniques and Modeling in Business and Economics)

Download

Browse Figures

Versions Notes

Abstract

Logistics operations demand real-time visibility and rapid response, yet minute-level traffic speed forecasting remains challenging due to heterogeneous data sources and frequent distribution shifts. This paper proposes a Deep Operator Network (DeepONet)-based framework that treats traffic prediction as learning a mapping from historical states and boundary conditions to future speed states, enabling robust forecasting under changing scenarios. We project logistics demand onto a road network to generate diverse congestion scenarios and employ a branch–trunk architecture to decouple historical dynamics from exogenous contexts. Experiments on both a controlled simulation dataset and the real-world Metropolitan Los Angeles (METR-LA) benchmark demonstrate that the proposed method outperforms classical regression and deep learning baselines in cross-scenario generalization. Specifically, the operator learning approach effectively adapts to unseen boundary conditions without retraining, establishing a promising direction for resilient and adaptive logistics forecasting.

Keywords:

logistics forecasting; operator learning; spatiotemporal modeling

1. Introduction

Short-term speed prediction, which involves forecasting vehicle or traffic speeds over brief future intervals, is a cornerstone technology for modern Intelligent Transportation Systems (ITS) and the advancement of autonomous vehicles [1]. With the emergence of new formats such as front warehouses, community retail, local warehouses, and hourly delivery, the integration between logistics and the national economy has deepened, significantly increasing the demand of supply chains for real-time visibility and rapid responsiveness. Smart logistics, supported by information technology, control techniques, optimization methods, and artificial intelligence, aims to reduce costs and increase efficiency across the entire supply chain through order allocation, vehicle management, route planning, and signal optimization [2]. To maintain stable operations and quickly recover from disruptions, road networks require predictability. Traffic forecasting [3], a core capability, encompasses key quantities such as traffic states, road speeds, and travel times. Accurate prediction of states and speeds provides the foundation for platoon control, route guidance, and signal optimization. Travel time prediction serves as an early indicator for scheduling and coordination. In digital-twin-driven online simulations, forecasting further supports rolling evaluation and scenario selection. For safe and resilient operations, speed prediction also enables risk identification and early warning, allowing interventions in high-risk spatiotemporal segments to reduce accidents and delays, ultimately improving punctuality and network reliability.

However, short-term road speed forecasting at the minute level faces multiple challenges in real-world environments. The first challenge lies in the heterogeneity and noise at the data level. Vehicle operation data coexist with multi-source sensor data, where missing values, measurement errors, and irregular sampling are common. The spatial coverage of the sensing network is uneven—dense in central urban areas but sparse in suburban regions, resulting in coverage gaps and biased measurements. The second challenge is the complexity of spatiotemporal coupling. Traffic data simultaneously contain static structures and dynamic evolution, with prominent cross-scale dependencies and nonlinear interactions. Deep learning has advanced spatiotemporal prediction by learning expressive, data-driven representations. Recent graph and sequence models capture spatial diffusion and temporal dependencies, improving traffic flow and speed forecasting [4]. The third challenge is nonstationarity and distribution shift. Conventional neural prediction models map vectors to vectors and typically require retraining or extensive fine-tuning when exogenous or boundary conditions change. Demand fluctuations, incidents, weather conditions, as well as modification to road networks and timetables occur frequently. These changes make models trained under previous conditions prone to mismatches in new scenarios, and the costs of maintenance and retraining remain high. Consequently, there is a need for modeling paradigms that can explicitly incorporate boundary changes at the input level while maintaining stable accuracy and reducing maintenance costs when scenarios change.

To address these challenges, this study proposes a short-term road speed forecasting framework that directly integrates logistics data with traffic prediction. To the best of our knowledge, no prior research has systematically mapped supply chain information—such as warehouse and customer locations or dynamic demand volumes—into traffic speed prediction while simultaneously applying Deep Operator Network (DeepONet) learning [5] to achieve cross-scenario transferability. We conduct an initial exploration in this direction by projecting logistics demand and warehouse allocation onto the road network, creating learnable boundary conditions, and then applying an operator-learning approach to map historical sequences and contextual information to next-step speed predictions. This provides a novel perspective for bridging supply chain systems and traffic systems. At the data level, we develop a unified data and evaluation pipeline that performs alignment, validity checks, anomaly removal, and feature standardization. We then split the data into training, validation, and test sets according to different scenarios, enabling us to evaluate the models’ robustness under diverse boundary combinations. At the modeling level, we adopt a branch–trunk architecture. The branch network encodes historical speed sequences of each link to capture short-term dynamics. The trunk network encodes contemporaneous exogenous and boundary states such as inflow, outflow, density, occupancy, waiting time, and travel time that represent congestion intensity and downstream constraints. Multiplicative coupling of the two networks creates a mapping from functions to functions, allowing boundary changes to enter the inference process through input variation. This approach maintains accuracy while reducing the need for retraining when scenarios change.

We constructed six Simulation of Urban MObility (SUMO) scenarios, labeled S001–S006, based on a five-kilometer urban subnetwork, using a time step of 60 s to generate link-level data. These scenarios are driven by the Solomon dataset and differ in random seeds, total trip volumes, and order–warehouse allocation strategies. These variations produce distinct Origin–Destination (OD) combinations, which characterize the paired relationships between origins and destinations, their intensities, and their temporal distributions. These scenarios also include order quantities, vehicle counts or trip numbers, departure times, and service time windows for each OD pair. Different OD combinations determine the spatial and temporal distributions of inflows and outflows across the road network, which in turn shape congestion patterns and boundary conditions, leading to varying levels of prediction difficulty and transfer challenges. For all scenarios, we extract speed, inflow, outflow, density, occupancy, waiting time, and travel time. Inputs are constructed from twelve-step historical speeds together with six contemporaneous contextual features, while the next-step speed serves as the supervisory signal. After validity checks and anomaly filtering, approximately 1.19 million edge–time samples remain. To evaluate cross-scenario transfer, we use S001–S004 for training and validation and reserve S005–S006 as unseen test sets. Within the visible scenarios, we apply an 80/20 temporal split to ensure leakage-free evaluation that encompasses a variety of boundary conditions. To quantify the benefits of the proposed approach in modeling nonlinearities and history–context interactions, we systematically compare it with Ridge regression, Multilayer Perceptrons (MLP), Long Short-Term Memory (LSTM) networks, Temporal Convolutional Networks (TCNs), Transformers, and Graph Neural Networks (GNNs). We further conduct ablation studies to verify the necessity of trunk-side exogenous variables and perform counterfactual perturbations of these variables to illustrate the model’s sensitivity and robustness to congestion transitions. Results demonstrate that the proposed design mitigates feature bias caused by heterogeneous and noisy data, improves adaptability to distribution shifts, and enhances the representation of complex spatiotemporal interactions. Challenges such as missing-data handling, explicit spatial coupling, and uncertainty quantification are discussed in the limitations and reserved for future research.

The contributions of this paper are summarized as follows:

This study constructs a unified logistics–traffic dataset by integrating Solomon demand data with SUMO-generated link-level states, producing approximately 1.2 million edge–time samples across six distinct scenarios. This dataset provides a reproducible foundation for cross-scene forecasting research.
We propose a DeepONet-based framework that decouples historical speeds, processed through a branch network, from contemporaneous exogenous and boundary states, processed through a trunk network. This approach enables boundary changes to be incorporated as functional inputs, eliminating the need for frequent retraining.
This paper systematically compares the strengths and weaknesses of the proposed method against classic and state-of-the-art models across three distinct datasets. While the results indicate that DeepONet does not outperform every baseline in every aspect, the comprehensive evaluation demonstrates that it achieves the optimal overall performance, particularly in terms of generalization and robustness. These comparative insights provide a valuable reference for future research in selecting appropriate modeling paradigms for complex traffic scenarios.

The remainder of the paper is organized as follows: Section 2 reviews related work in logistics forecasting, traffic prediction, and operator learning. Section 3 introduces the background and formal problem statement. Section 4 describes the data design, feature construction, and the DeepONet architecture along with training protocols. Section 5 presents the experimental results, diagnostics, and ablation studies, followed by a discussion on deployment implications. Section 6 concludes the paper and outlines limitations and future directions.

2. Related Work

Short-term speed prediction is not a monolithic concept. Its definition, particularly the duration of the prediction horizon, is highly dependent on the application context. The field is broadly divided into two categories: macroscopic traffic flow forecasting and microscopic vehicle dynamics prediction. In this work, we focus on the former, which aims to predict aggregated traffic speed, typically the average speed of all vehicles on a specific road segment, over short horizons at the link or corridor level. This task is essential for traffic management, signal control, and route guidance. The prediction horizon in this context generally spans minutes, often ranging from 1 to 30 min [6]. In this work, the time interval for prediction is set to 1 min. The latter category focuses on individual vehicle trajectories and maneuvers over very short horizons and is crucial for autonomous driving and collision avoidance. The prediction horizons in this context are up to 10 s [7].

2.1. Application of Deep Learning Method in Macroscopic Short-Term Speed Forecasting

Classical macroscopic speed forecasting methods include statistical models such as AutoRegressive Integrated Moving Average (ARIMA) [8] and Kalman filters [9], which are effective for stationary regimes but limited in handling nonlinearity and dynamic boundaries. Simulation platforms like AnyLogic(V8), FlexSim(V2024), and SUMO(V1.24.0) [10] are widely used for prototyping and assessing operations; however they depend heavily on calibration quality and face scalability challenges [11]. With the proliferation of ubiquitous sensing and digital infrastructure, deep learning has become a central paradigm for spatiotemporal prediction. Recent surveys highlight the rapid evolution of deep learning for traffic forecasting, emphasizing the shift from simple time-series models to complex graph-based and physics-informed architectures [12]. With the rapid development of deep learning, various neural network architectures have been proposed for traffic prediction tasks. Besides MLP, Convolutional Neural Networks (CNNs), and sequence models such as LSTM and TCN [13] are widely applied in time-series prediction tasks. More recently, approaches combining graph neural networks with differential equations [14] and attention mechanisms [15] have shown promise in capturing complex spatiotemporal dynamics. Advanced architectures like Dynamic Spatial Transformers [4], Gated Attention Graph Networks [16], and Diffusion-Enhanced Transformer Neural Operators [17] further push the boundaries of prediction accuracy by integrating low-rank tensor compression, multi-scale attention, and generative diffusion processes. Despite accuracy gains, many architectures remain brittle under distribution shift and require costly re–training when exogenous or boundary conditions like inflow or occupancy change.

A persistent challenge in macroscopic speed forecasting is ensuring transferability across different scenes. Distribution shifts, sparse sampling, and sensor noise degrade model performance outside the training domain [18]. In traffic forecasting, models trained in one city or corridor often underperform when applied to another without adaptation [19]. Domain adaptation techniques and adversarial alignment provide partial solutions but frequently require substantial retraining and engineering effort. Similar concerns arise in supply chain forecasting [20]. Furthermore, the opacity of deep models complicates deployment in safety-critical logistics operations where auditability is required. These limitations motivate frameworks that can natively accommodate boundary variability and enable transparent analysis. Operator learning [21], which maps functions to functions, offers a promising avenue to address these challenges.

2.2. Operator Learning in Scientific Machine Learning

Operator learning is supported by the Universal Approximation Theorem for operators [21]. A recurring theme is improved generalization under parametric and boundary changes, a property directly relevant to logistics, where exogenous conditions frequently evolve. Operator learning emerged in scientific machine learning to directly approximate mappings between function spaces when classical vector-to-vector learning is inadequate for tasks such as partial differential equation (PDE) solution operators, fractional operators, or control-to-state maps [22]. Recent advances include U-shaped neural operators [23] and physics-informed extensions [24], which further enhance the capability to solve complex PDE-governed systems. Its mathematical foundation extends universal approximation results from finite-dimensional functions to operators on compact subsets of Banach spaces [25]. If an operator is continuous on a compact set of admissible inputs, then a suitably parameterized neural operator can approximate it uniformly on that set. This perspective justifies learning function-to-function maps rather than compressing all information into fixed-size vectors.

Physics-informed deep learning has also been applied to traffic state estimation, integrating conservation laws with data-driven models [26]. These methods leverage the underlying physics of traffic flow to improve generalization and data efficiency, aligning well with the operator learning perspective adopted in this work. Recent studies have explicitly benchmarked neural operators for traffic state estimation [5] and explored their use in boundary stabilization control [27], validating their potential for both prediction and management. Furthermore, the integration of traffic prediction uncertainty into logistics operations, such as parking reservation, has been highlighted as a critical direction for practical deployment [28].

We consider an operator

\begin{matrix} G : & V \subset C (K_{1}) ⟶ C (K_{2}), \\ u ⟼ (G (u)) (y), y \in K_{2}, \end{matrix}

(1)

where u may encode source terms, initial and boundary conditions, or control signals. The variable y denotes an evaluation location, which may include spatial coordinates, time, or other query parameters. Training data are triples

(u^{(i)}, y^{(i)}, G (u^{(i)}) (y^{(i)}))

. To obtain a finite representation of the infinite-dimensional input u, choose sensor points

{x_{j}}_{j = 1}^{m} \subset K_{1}

and form

u = {[u (x_{1}), u (x_{2}), \dots, u (x_{m})]}^{⊤} \in R^{m} .

(2)

Operator learning parameterizes

G

with two subnetworks and a bilinear fusion. The branch network

g : R^{m} \to R^{p}

encodes the input-function samples

u

, and the trunk network

f : K_{2} \to R^{p}

encodes the query y. The prediction is

\hat{G} (u) (y) = 〈 g (u), f (y) 〉 + b_{0} = \sum_{k = 1}^{p} g_{k} (u) f_{k} (y) + b_{0},

(3)

where p is the embedding dimension, which is interpretable as the rank of a low-rank expansion. Here,

g_{k}

and

f_{k}

are the k-th components of the branch and trunk embeddings respectively, and

b_{0}

is an optional bias. This formulation realizes the operator mapping by conditioning on u through the branch embedding and evaluating at an arbitrary y through the trunk embedding, without requiring explicit convolutions or kernels. Consequently, it accommodates irregular geometries and unaligned samples. Additional context c, such as material or scenario parameters, can be concatenated to the branch input,

g ([u; c])

, or to the trunk input,

f ([y; c])

.

2.3. Physical Interpretation of the Architecture

In the context of traffic forecasting, the branch and trunk networks play distinct physical roles. The branch network processes the historical speed sequence

s_{t - L + 1 : t} (e)

, which represents the system inertia or the short-term momentum of traffic flow. This captures the intrinsic dynamics of the vehicles currently on the road. The trunk network processes the contemporaneous context

u_{t} (e)

. Specifically, variables such as density and occupancy quantify the local congestion intensity, while flow variables (entered, left) represent the boundary constraints and mass exchange with the network. This context represents the boundary conditions and external constraints acting on the flow. The inner product

〈 g (\cdot), f (\cdot) 〉

then models the coupling between the system’s inertial state and the external environment. This factorization allows the model to learn how different boundary conditions (trunk) modulate the evolution of traffic dynamics (branch), thereby enabling generalization to new scenarios where the boundary conditions change but the underlying physics of flow remains consistent [21].

The standard training objective of the operator is empirical risk minimization with mean-squared error:

L (θ) = \frac{1}{N} \sum_{i = 1}^{N} {({\hat{G}}_{θ} (u^{(i)}) (y^{(i)}) - G (u^{(i)}) (y^{(i)}))}^{2},

(4)

where

θ

collects the parameters of g and f. When physics constraints are available, one may add a residual term in strong or weak form, for example

L_{total} = L_{data} + λ_{phys} \frac{1}{M} \sum_{r = 1}^{M} {|N_{y} [{\hat{G}}_{θ} (u^{(i)}) (y_{r})] - q (y_{r})|}^{2} .

(5)

where

N_{y} [\cdot] = q

encodes the governing operator in y and source q. This couples operator learning with physics-informed regularization.

After training, inference proceeds in two steps. Given a new input function

u^{*}

, evaluate it on the same sensors to obtain

u^{*}

and compute the branch embedding

g (u^{*}) \in R^{p}

. For any collection of query locations y, compute

f (y) \in R^{p}

and take the inner product:

\hat{G} (u^{*}) (y) = 〈 g (u^{*}), f (y) 〉 + b_{0} .

(6)

Changing

u^{*}

only recomputes the branch output; changing y only recomputes the trunk output, enabling cross-condition generalization and arbitrary-point evaluation. The trunk naturally accepts spatiotemporal queries by setting

y = (x, t)

. For multi-output targets, one may append a small linear head from the scalar output to multiple channels, or use separate embeddings per channel.

Practical choices include the sensor count m in Equation (2) where more sensors capture finer details of u but increase cost, the embedding rank p in Equation (3) controlls expressive power, standardization or nondimensionalization of inputs, and lightweight MLP or residual blocks for both branch and trunk. Operator learning has demonstrated strong results across several domains: surrogate modeling for fluid and transport PDEs, fractional and integral operators, stochastic dynamics and filtering, control-to-state and model-predictive-control maps, and multi-physics responses. These successes highlight advantages in cross-condition generalization, handling irregular data, and enabling fast, arbitrary-point evaluations after offline training—properties that are directly useful for real-time macroscopic speed forecasting and decision support.

2.4. Comparison with Classical, Geometric, and Operator Learning

While classical and temporal deep learning models such as Ridge regression, MLP, and LSTM networks effectively capture temporal correlations in stationary time-series, they fundamentally map fixed-size vectors to vectors. This limitation means they lack the mechanism to explicitly handle changing boundary conditions without retraining, often leading to poor generalization under distribution shifts. While GNNs [29] explicitly model spatial dependencies via a fixed adjacency matrix, they often struggle when the network topology changes or when defining the graph structure is ambiguous. In contrast, DeepONet learns a continuous operator that maps functional inputs to outputs, making it naturally mesh-independent and adaptable to varying boundary conditions without retraining. Compared to Fourier Neural Operators (FNOs) [30], which are highly efficient on uniform grids using Fast Fourier Transforms (FFTs), DeepONet offers greater flexibility for irregular geometries and heterogeneous sensor placements common in traffic networks. Our branch–trunk factorization specifically targets the separation of temporal dynamics from exogenous context, a structure that aligns well with the logistics-traffic coupling problem.

This work addresses link-level speed prediction at 60 s resolution to support traffic control and routing. We construct a framework that projects benchmark demand onto a 5 km urban subnetwork and generates microscopic traffic states, producing controlled yet realistic boundary variability for cross-scene transfer. We develop a branch–trunk factorization that disentangles short-history signals from exogenous and boundary context and demonstrate zero-retraining transfer on held-out scenes. We further provide diagnostic and counterfactual analyses that link accuracy gains to regime-consistent behavior and operational interpretability. To our knowledge, the combination of Solomon-driven demand, SUMO-based microscopic states, and operator learning for link-speed forecasting has not been previously reported.

3. Background and Problem Formulation

Motivation and Data Infrastructure for Macroscopic Short-Term Speed Forecasting

Short-horizon, link-level speed forecasts are both urgently needed and practically attainable. Public agencies seek to lower system-wide logistics costs via congestion mitigation and network reliability, while enterprises aim to reduce operating costs through improved transport scheduling, warehouse tasking, and production planning. These objectives are enabled by high-frequency data streams from loop detectors, video counters, Global Positioning System (GPS) trajectories, and connected vehicles, together with platform-level integration of demand, inventory, production, and shipment records. This big data infrastructure aligns public–private needs and supplies the covariates required for minute-scale forecasting in ITS, supporting proactive signal control, dynamic speed limits, incident detection, reliable travel-time estimation, and predictive routing for freight [12]. At present, however, production datasets with the necessary spatial coverage, temporal resolution, and metadata are often inaccessible due to privacy and governance constraints, heterogeneous sensing deployments, missingness, and the difficulty of aligning exogenous and boundary conditions at scale. In this context, controlled data generation remains a practical and rigorous path. It enables reproducible experiments, systematic ablations, and conterfactual stress tests under well-specified distribution shifts. Looking ahead, continued advances in sensing, communications, and digital integration make it increasingly likely that such real-world data will be collected and shared in near real time. Our study therefore develops and evaluates methods in advance of this capability, while using synthesized scenarios to ensure coverage, control and reproducibility.

We consider a 5 km urban subnetwork, defined as a contiguous district whose total centerline roadway length is approximately 5 km and that contains multiple signalized intersections and boundary inflow and outflow links. The choice of a 5 km scale is deliberate. It matches the control horizon of corridor- and district-level operations, such as coordinated signal control and variable speed advisories, where minute-resolution predictions are most actionable. Moreover, it is small enough to support reproducible, microscopic simulations with rich heterogeneity at manageable computational cost. It provides several boundary links so that exogenous inflow and outflow can vary across scenarios, which is essential for evaluating cross-scene transfer. Demand and customer attributes are taken from the Solomon benchmark and spatially assigned to network nodes, while traffic states are generated with the SUMO microscopic simulator under multiple scenarios [31]. In this research, signals are aggregated at interval

Δ = 60

s. Training, validation and test splits are performed by scenario to support cross-scene evaluation and to reflect distribution shift considerations [32].

Given a directed road network with edge set

E

, SUMO outputs per-interval measurements for each edge

e \in E

, including mean speed

v_{t} (e)

, density, occupancy, counts of vehicles entering and leaving, average waiting time, and travel time [31]. These indicators summarize instantaneous traffic state and congestion intensity on each link. For each edge e and interval t, the goal is to predict the next-interval mean speed

y_{t + 1} (e) = v_{t + 1} (e)

from a leakage-safe feature vector that combines short speed histories with contemporaneous exogenous variables:

Architectural details, training protocols, and ablations are provided in Section 4 and Section 5.

4. Methodology

As illustrated in Figure 1, our methodology follows a three-stage pipeline: (i) data and scenario construction where Solomon demand instances are projected and simulated on a 5 km SUMO subnetwork to produce link-level edge states; (ii) feature engineering and dataset assembly that aligns, filters, and standardizes twelve-step speed histories together with contemporaneous exogenous and boundary covariates; and (iii) model learning and diagnostics using a branch–trunk Deep Operator Network that decouples short-term histories from contextual boundary inputs, followed by systematic cross-scene evaluation, ablations, and counterfactual perturbations.

4.1. Solomon Dataset as the Demand Prior

We ground the demand layer in the classical Solomon vehicle routing problem with time windows benchmarks [33]. The suite contains 56 instances with 100 customers, organized into six classes—C1, C2, R1, R2, RC1, RC2—where C/R/RC denote clustered, random, and mixed spatial layouts, and the “1” vs. “2” suffix reflects tighter vs. looser time windows, often implying a shorter vs. longer planning horizon. Each instance places 100 customers on a

100 \times 100

grid and follows a common schema: node index i, coordinates

(x_{i}, y_{i})

, demand

q_{i}

, ready time

e_{i}

, due date

l_{i}

, and service duration

d_{i}

; the depot is node 0. File headers specify the fleet-size limit K and vehicle capacity Q. These fields map directly to our SUMO pipeline: coordinates are projected to the network coordinate reference system and snapped to the nearest nodes and edges; depot identifiers anchor origins; time windows drive release and service scheduling to produce temporally consistent OD flows; and demands determine vehicle loading and trip counts. We use Solomon because its controlled spatial patterns and time-window tightness create diverse routing pressures and post-assignment congestion, which is essential for stress-testing forecasting models under heterogeneous boundary conditions.

4.2. Simulation Environment and Dataset Construction

We consider an urban subnetwork of approximately

5 km

imported into SUMO, and instantiate six scenarios S001–S006 that vary random seeds and trip loads to diversify demand [10]. Beyond the static network, each scenario is parameterized by logistics demand and supply. Customer requests and depot locations shape OD patterns and temporal loading, which in turn drive the edge states observed during simulation. We ingest (i) customer planar coordinates

(x, y)

which are projected to the network coordinate reference system, (ii) demand quantity with units or weight, (iii) requested service time windows

[\underset{̲}{t}, \bar{t}]

, and (iv) depot or warehouse identifiers and coordinates. Orders are snapped to nearest edges and nodes and grouped into time buckets to form OD flows or discrete trips consistent with their time windows and depot assignments.

Given the OD specification, SUMO produces vehicle- and edge-level traces: (i) per-vehicle routes and traversed edge sequences, and, if needed, per-time step positions; (ii) per-interval edge aggregates, including speed, entered and left, density, occupancy, waitingtime, traveltime; and (iii) per-vehicle summaries. These outputs connect the logistics side, including who, when, from which depot to which customer, and with how much load to the traffic side, including which edges are used, with speeds and queues. This integration enables supervised learning on edge dynamics under realistic boundary conditions. Table 1 summarizes the data sources and their roles in linking logistics demand with traffic states.

From each edge data, we extract per-edge, per-interval measurements including speed, entered, left, density, occupancy, waiting time, and travel time. To rigorously evaluate the contribution of spatial information, we construct two distinct feature sets:

Baseline Dataset without Spatial Features:This configuration focuses on temporal dynamics and local boundary conditions. The input vector $x_{t} (e)$ concatenates 12 speed lags ( $lag 1 \dots lag 12$ ) and 6 contemporaneous covariates (density, occupancy, etc.) of the target edge itself, yielding an 18-dimensional input vector. This serves as the primary dataset for benchmarking temporal sequence models.
Spatial Dataset with Spatial Features: To capture network-level dependencies, we augment the baseline features with upstream and downstream context. For each target edge, we identify its immediate predecessor and successor links and append their mean speed and density to the input vector. This increases the input dimensionality to 23, allowing models to explicitly learn from spatial propagation effects.

We form supervised pairs

(x_{t} (e), y_{t + 1} (e))

using these feature sets, with the scalar target being the next-step speed

y_{t + 1}

. The combined dataset has

23, 379, 799

rows before filtering. To reduce artifacts, we retain rows satisfying validity checks for

traveltime > 0

, nonnegative counts and finite speeds [10]. Specifically, the raw simulation output generated approximately 23.3 million edge-time samples. However, due to the sparse nature of traffic in the 5 km subnetwork, a significant portion (approx. 95%) of these samples represented zero-speed or empty-road conditions which provide limited supervisory signal for learning congestion dynamics. To focus the model on active traffic states, we filtered out these zero-value samples, resulting in a final high-quality dataset of approximately 1.19 million samples. This filtering process ensures that the model training is driven by meaningful traffic interactions rather than the dominant background of empty roads. We emphasize that this filtering was chosen to concentrate evaluation on informative congestion dynamics. We inspected marginal speed distributions before and after filtering and found that the qualitative ordering of model performance is unchanged; including the full raw set reduces sensitivity to congestion regimes but does not alter the main comparative conclusions reported here. We exclude the current speed at time t from contemporaneous features to avoid leakage; only lagged speeds are used in inputs. Standardization is fit on training scenarios and applied to validation and test to prevent target or covariate leakage [34]. We split by scenario: S001–S004 supply training and validation, an 80/20 temporal split within each seen scene, and S005–S006 form the test set. The resulting sizes are

train = 953, 351

,

val = 119, 168

,

test = 119, 168

.

4.3. Real-World Dataset

To validate the generalization capability of our framework beyond simulation, we utilize the METR-LA benchmark dataset [35], a widely used reference in traffic forecasting. This dataset collects traffic speed readings from 207 loop detectors on the highways of Los Angeles County, spanning a period of 4 months from 1 March 2012 to 30 June 2012.

Unlike the link-level simulation data, METR-LA provides graph-structured data where sensors are nodes in a network. The adjacency matrix is pre-computed based on the driving distance between sensors, using a Gaussian kernel thresholded to retain only strong connections. The data are aggregated to 5-min intervals, matching the typical control horizon of ITS applications. We use the standard chronological split of 70% training, 10% validation, and 20% testing. This dataset introduces real-world complexities such as sensor noise, missing values, and non-recurrent congestion events, providing a rigorous testbed for evaluating model robustness in complex, nonlinear topologies.

4.4. Baseline and Comparative Models

We compare (i) naïve persistence, defined as

{\hat{y}}_{t + 1} (e) = v_{t} (e)

[36]; (ii) Ridge regression, an L2-regularized linear model applied to the 18-dimensional input [37]; (iii) MLP operating on the same 18-dimensional input, a choice supported by modern universal-approximation results [38]; (iv) LSTM, which utilizes the same 12-step window [39]; (v) TCN, employing dilated causal convolutions on the same 12-step window [40]; (vi) Transformer, which incorporates self-attention mechanisms for time-series forecasting [41]; (vii) GNN, or Graph Neural Network, which explicitly models spatial dependencies via graph convolutions [42]. Unless noted, all models use identical splits and early stopping on validation

R^{2}

[43]. All baselines consume the same feature set defined above to ensure parity.

Ridge: We fit a linear model on $x_{t} (e) = [s_{t - 11 : t} (e), z_{t} (e)] \in R^{18}$ :

$min_{β, β_{0}} ∥ y - β_{0} 1 - X β ∥_{2}^{2} + α {∥ β ∥}_{2}^{2},$

(7)

with features standardized using training statistics and intercept $β_{0}$ . The regularization $α$ is selected on a log-grid ${10^{- 6}, \dots, 10^{2}}$ . Ridge offers a strong linear baseline with high inference throughput.
MLP: Two hidden layers of width 256 with Rectified Linear Unit, or ReLU, activation, dropout $0.1$ , Adaptive Moment Estimation, known as Adam, optimizer with a learning rate of $10^{- 3}$ , batch size 8192, up to 30 epochs; early stopping on validation.
LSTM: We form a sequence ${x^{(k)}}_{k = 1}^{12}$ where each step uses the k-th speed lag and the same exogenous context:

$x^{(k)} = [{lag}_{k}, z_{t} (e)] \in R^{1 + 6},$

(8)

yielding an input tensor $(batch, time = 12, feat = 7)$ . A single-layer LSTM (hidden size 128, dropout 0.1) processes the sequence; the last hidden state feeds a linear head to predict $y_{t + 1}$ . Optimizer: Adam with a learning rate of $10^{- 3}$ , batch size 8192, 30 epochs, and early stopping.
TCN: We use a causal Temporal Convolutional Network on the same $(12 \times 7)$ sequence: four residual blocks with dilations $[1, 2, 4, 8]$ , kernel size 3, 64 channels, dropout $0.1$ ; causal padding prevents leakage. The receptive field, which is greater than 12, covers the window. The block output is global-pooled and passed to a linear head. Optimizer and early stopping are applied as described above.
Transformer: We employ a standard Transformer encoder architecture adapted for time-series forecasting. The model consists of 2 encoder layers with 4 attention heads, a model dimension of 64, and a feed-forward dimension of 256. Positional encodings are added to the input sequence to retain temporal order information.
GNN: We utilize a GNN to capture spatial dependencies. For the simulation dataset, the graph is constructed based on physical connectivity, specifically upstream and downstream links. For the METR-LA dataset, we use the predefined sensor adjacency matrix. The model consists of two Graph Convolutional Network (GCN) layers with 64 hidden units followed by a fully connected output layer.

To ensure a fair comparison, we performed a grid search for the hyperparameters of each model using the validation set. The search space included learning rates in

{10^{- 2}, 10^{- 3}, 10^{- 4}}

, batch sizes in

{64, 128, 256, 1024}

, and dropout rates in

{0.1, 0.3, 0.5}

. The final hyperparameters selected for the reported experiments are summarized in Table 2.

Table 3 summarizes the implementation-level architectural choices used for each model. In our implementation DeepONet uses branch and trunk MLPs formed by two 256-unit hidden layers that project to a latent embedding of dimension p (default

p = 128

). When spatial features are included the trunk input expands from 6 to 10 (adding upstream/downstream speed and density). Learning rate, batch size and dropout were tuned via the validation grid search described above; the DeepONet latent dimension p was kept at the default value for the reported experiments. The configurations listed here correspond to the concrete implementations used in the ablation and comparative evaluations reported below.

4.5. Operator-Learning Model

We model the one-step map from an edge’s recent speed history and its contemporaneous context to the next-step speed as a neural operator acting on two inputs: the 12-step lag vector

s_{t - 11 : t} (e) \in R^{12}

and the 6-d context

z_{t} (e) \in R^{6}

. Let

g : R^{12} \to R^{p}

and

f : R^{6} \to R^{p}

be branch and trunk embeddings. The prediction is their inner product in a p-dimensional latent space:

{\hat{y}}_{t + 1} (e) = 〈g (s_{t - 11 : t} (e)), f (z_{t} (e))〉 = \sum_{k = 1}^{p} g_{k} (s_{t - 11 : t} (e)) f_{k} (z_{t} (e)),

(9)

which realizes a low-rank factorization of the operator from

(s, z)

to y [22]. The overall architecture is illustrated in Figure 2. Architecturally, both branch and trunk are MLPs with hidden width 256, dropout

0.1

, and linear p-dimensional projections; we set

p = 128

. Optimization uses Adam with learning rate

10^{- 3}

, batch size 1024, up to 50 epochs with early stopping on validation

R^{2}

. All features are standardized using training statistics, and train/validation/test splits, random seeds, and library versions are fixed for reproducibility.

The factorized form (9) decouples temporal history from exogenous conditions and enables counterfactual analyses without retraining: varying

z_{t} (e)

, for instance by perturbing entered or density, changes

f (\cdot)

while keeping

g (\cdot)

fixed, thus isolating the effect of boundary and context signals on

{\hat{y}}_{t + 1} (e)

. This branch–trunk inner-product realization exactly matches the DeepONet formulation for operator learning [21], so we henceforth refer to our model as DeepONet.

The theoretical advantage of this operator learning formulation lies in its alignment with the physical nature of traffic flow. Traffic dynamics are fundamentally governed by partial differential equations, where the system state evolves as a function of time and space subject to boundary conditions. Standard deep learning models approximate a finite-dimensional mapping

R^{n} \to R^{m}

, effectively memorizing point-to-point correlations. In contrast, DeepONet approximates the continuous solution operator that maps the space of input functions and parameter functions to the solution space. By explicitly separating the encoding of history and context, the model learns a basis expansion of the solution operator, where the Trunk network identifies the basis functions of the traffic regimes and the Branch network computes the coefficients based on the input state. This mechanism enables robust generalization to unseen scenarios, as the model learns the underlying physical laws governing the transition between states rather than just the statistical distribution of the training data.

To further clarify the training and inference process, Algorithm 1 details the DeepONet procedure for traffic speed forecasting.

Algorithm 1 DeepONet Training and Inference for Traffic Speed Forecasting

Require:: Historical speed sequence $s_{t - L + 1 : t} (e) \in R^{L}$ , Context vector $z_{t} (e) \in R^{d_{z}}$ , Target speed $y_{t + 1} (e)$
Ensure:: Trained Branch network $g_{ϕ}$ , Trunk network $f_{ψ}$
1:: Initialize: Parameters $ϕ, ψ$ for Branch and Trunk networks
2:: Hyperparameters: Learning rate $η$ , Batch size B, Latent dim p
3:: while not converged do
4:: Sample batch of B pairs ${(s^{(i)}, z^{(i)}, y^{(i)})}_{i = 1}^{B}$ from training set
5:: for $i = 1$ to B do
6:: Compute Branch embedding: $g^{(i)} = g_{ϕ} (s^{(i)}) \in R^{p}$
7:: Compute Trunk embedding: $f^{(i)} = f_{ψ} (z^{(i)}) \in R^{p}$
8:: Predict speed: ${\hat{y}}^{(i)} = 〈 g^{(i)}, f^{(i)} 〉 = \sum_{k = 1}^{p} g_{k}^{(i)} \cdot f_{k}^{(i)}$
9:: Compute Loss: $L^{(i)} = {({\hat{y}}^{(i)} - y^{(i)})}^{2}$
10:: end for
11:: Update $ϕ, ψ$ via Adam optimizer to minimize $\frac{1}{B} \sum_{i = 1}^{B} L^{(i)}$
12:: end while
13:: Inference: Given new history $s^{*}$ and context $z^{*}$ , predict ${\hat{y}}^{*} = 〈 g_{ϕ} (s^{*}), f_{ψ} (z^{*}) 〉$

4.6. Evaluation

We report Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and

R^{2}

:

MAE = \frac{1}{N} \sum_{i} | y_{i} - {\hat{y}}_{i} |, RMSE = \sqrt{\frac{1}{N} \sum_{i} {(y_{i} - {\hat{y}}_{i})}^{2}}, R^{2} = 1 - \frac{\sum_{i} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i} {(y_{i} - \bar{y})}^{2}} .

(10)

In brief, MAE reports the average absolute deviation in

km / h

and is relatively robust to outliers; RMSE reports the quadratic mean error and emphasizes large deviations, which is desirable when significant mistakes are particularly costly; and

R^{2}

reports the proportion of variance explained relative to a mean-only baseline and can be negative if the model underperforms that baseline. Reporting de-standardized MAE and RMSE in

km / h

enables operational interpretation, while

R^{2}

facilitates scale-free comparison across scenes. All metrics are computed on de-standardized speeds (km/h) [34].

5. Experimental Results and Discussion

5.1. Overall Performance Comparison

Table 4 presents a comprehensive comparison of model performance across three experimental modules: (1) The SUMO Baseline, which uses temporal features only; (2) The SUMO Spatial module, which is enhanced with upstream and downstream features; and (3) The Real-World Validation using the METR-LA dataset.

In general, we observe distinct performance patterns across the three scenarios. In the linear simulation (Modules 1 and 2), temporal sequence models like LSTM and DeepONet dominate, as the system dynamics are primarily driven by local history and boundary conditions. Conversely, in the complex METR-LA network (Module 3), the advantage shifts towards architectures capable of modeling high-dimensional spatial interactions, where DeepONet and Transformer perform competitively with recent state-of-the-art models. Notably, GNNs show a significant performance jump from simulation to real-world, validating their dependency on rich graph structures. While DeepONet is competitive in the simpler simulation tasks, its true strength lies in its robustness and scalability to complex, real-world topologies, where it outperforms traditional baselines by a wide margin.

5.2. Baseline Simulation Experiments

Implementation Details

To establish a performance benchmark, we first evaluated the models on the standardized SUMO dataset without explicit spatial topology features. The input vector

X_{t}

consisted of 19 dimensions, capturing the local temporal history from Lags 1 to 12 and instantaneous traffic variables such as density and occupancy of the target edge.

As shown in Module 1 of Table 4, the LSTM model achieved the highest accuracy with an

R^{2}

score of 0.8188, slightly outperforming the Transformer and DeepONet, which achieved

R^{2}

scores of 0.8152 and 0.8122, respectively. The MLP and TCN models followed with

R^{2}

scores of 0.7975 and 0.7905. The linear Ridge baseline lagged significantly behind with an

R^{2}

of 0.4631, confirming the nonlinear nature of the traffic dynamics. These results indicate that for a single road segment in a controlled simulation environment, the temporal autocorrelation is the dominant predictive factor. The strong performance of LSTM, Transformer and DeepONet suggests that capturing sequence dependencies and operator-level mappings provides an advantage even in this baseline setting. Furthermore, DeepONet’s performance is comparable to the specialized LSTM, demonstrating that the branch–trunk architecture effectively encodes the temporal inertia through the branch network without requiring recurrent computation.

5.3. Spatial Feature Analysis

Addressing the concern regarding the omission of spatial correlations, we extended the feature space to include upstream and downstream dependencies. We constructed a “Spatial” dataset, referred to as Module 2, where the input dimension was increased to 23 by appending the mean speed and density of adjacent links, comprising

v_{u p}, v_{d o w n}, k_{u p}, k_{d o w n}

.

Counter-intuitively, the inclusion of these local spatial features did not improve performance in the simulation environment; in fact, we observed a slight decrease in

R^{2}

across all models, where DeepONet and MLP scored 0.7473 and 0.7031, respectively. We attribute this to two factors:

Topology Simplicity:The simulation utilizes a linear 5km corridor where upstream conditions are highly collinear with the local temporal history; for instance, $v_{u p} (t)$ provides similar information to $v_{l o c a l} (t - 1)$ .
Noise Introduction: In the microscopic simulation, short-term fluctuations in adjacent links (due to individual driver behavior) may introduce stochastic noise that outweighs their predictive signal for the aggregated 5 min interval.

This result supports a critical physical interpretation: in the DeepONet framework, the boundary conditions, such as flow entering and leaving, serve as the interface for wave propagation. In the one-dimensional Lighthill–Whitham–Richards (LWR) traffic flow model, congestion waves propagate through the boundaries. By learning the operator that maps these boundary functions to the internal state, DeepONet implicitly learns the wave propagation physics. The fact that explicit spatial features did not improve performance suggests that for this linear topology, the temporal dynamics and boundary conditions were indeed sufficient to capture these effects.

However, this negative result is scientifically valuable: it demonstrates that DeepONet’s operator learning capability is robust enough to extract maximum information from temporal dynamics alone, without relying on explicit spatial feature engineering in simple topologies. In this module, LSTM achieved the highest performance with an

R^{2}

of 0.7483, closely followed by DeepONet with 0.7473, both outperforming the Transformer (

R^{2}

of 0.7310) and GNN (

R^{2}

of 0.7166). This reinforces the finding that sequence modeling and operator mapping are more effective than graph-based methods for this specific linear topology. The lower performance of GNN here, with an

R^{2}

of approximately 0.72, highlights a limitation of graph convolutions in sparse, linear structures where message passing offers little advantage over direct temporal modeling.

Figure 3 provides a deeper robustness analysis, showing that while GNN performance degrades significantly in unseen scenarios, as seen in Figure 3a, and high-density regimes shown in Figure 3b, DeepONet maintains stable low error rates, confirming its superior generalization capabilities.

5.4. Real-World Validation

To validate the proposed approach on a complex, nonlinear network, we applied the models to the METR-LA benchmark dataset. Unlike the simulation, this dataset involves a graph of 207 sensors with complex spatial dependencies.

Here, the advantages of advanced architectures became evident. DeepONet achieved top-tier performance with an

R^{2}

of 0.9172, significantly outperforming the MLP baseline with an

R^{2}

of 0.8791 and surpassing the standard GNN baseline which reached 0.8952. The Transformer also performed exceptionally well at 0.9137. DeepONet’s superior performance suggests it can capture propagation effects effectively even without explicit graph convolution layers, likely by learning the high-dimensional mapping of the system’s state.

Figure 4 visualizes this performance gap through parity plots, where DeepONet shows significantly tighter clustering around the diagonal compared to MLP and GNN, particularly in the high-speed free-flow regime.

This result confirms that while simple temporal models suffice for linear simulations, DeepONet and Transformer architectures are essential for capturing the complex, high-dimensional spatiotemporal dynamics of real-world traffic networks. The significant performance gap between DeepONet/Transformer and MLP on real data of approximately 4% in

R^{2}

strongly supports the adoption of operator learning frameworks for practical ITS applications.

It is worth noting that the training times for MLP and LSTM in this module of approximately 4 to 6 s are significantly shorter than in the simulation experiments. This is attributed to the smaller dataset size, 34 k samples compared to 1.2 million, and the rapid convergence of these baselines, which triggered early stopping around epoch 15. Additionally, the LSTM implementation utilized a vectorized input structure to maximize GPU parallelism, avoiding the high computational cost of sequential unrolling.

The contrast in GNN performance between the simulation in Module 2 and real-world in Module 3 experiments is particularly illuminating. In the sparse, linear simulation topology, GNNs struggled with an

R^{2}

of approximately 0.72 as the graph structure provided limited connectivity for effective message passing. However, in the dense, interconnected METR-LA graph, GNNs thrived achieving an

R^{2}

of 0.8952, validating their design for graph-structured data. Crucially, DeepONet performed consistently well across both regimes, demonstrating a versatility that neither pure temporal models such as LSTM nor pure spatial models such as GNN could match individually. Figure 5 further illustrates this by comparing the time-series forecasts, where DeepONet and Transformer accurately track abrupt speed drops during rush hours, unlike the lagging baselines.

5.5. Ablation Study

To verify the contribution of each component in the DeepONet architecture, we conducted an ablation study by varying the network structure and latent dimension p. Table 5 summarizes the ablation results tested on the unfiltered simulation data, which contains a significant number of zero values compared to the filtered dataset used in the main experiments. As shown in Figure 6, removing the Branch network, which relies solely on the Trunk network for exogenous features, leads to a significant performance drop of approximately 15%, confirming that the historical state trajectory encoded by the Branch network is critical for accurate forecasting. Furthermore, we analyzed the sensitivity to the latent dimension p. Performance degrades noticeably when

p < 32

, indicating underfitting, while increasing p beyond 128 yields diminishing returns, justifying our choice of

p = 128

as an optimal balance between accuracy and computational efficiency.

In addition to architectural components, we evaluated the impact of specific trunk features. Our analysis identified density and travel time as the most critical exogenous variables.

5.6. Discussion

The experimental results highlight several key characteristics of the DeepONet framework for traffic forecasting. First, the model demonstrates remarkable robustness across varying topological complexities. In the linear SUMO simulation, it performs on par with specialized sequence models like LSTM, while in the complex METR-LA network, it achieves state-of-the-art performance comparable to Transformers and superior to standard GNNs. This suggests that the operator learning paradigm, which maps functional spaces rather than discrete points, effectively captures the underlying physical dynamics of traffic flow regardless of the specific network structure.

Second, the “Digital Twin” capability, evidenced by the recovery of the fundamental diagram in Figure 7, distinguishes DeepONet from purely statistical baselines. By learning the operator

G : u \to G (u)

, the model does not merely memorize historical patterns but internalizes the causal relationship between density and speed. This allows for reliable counterfactual reasoning, a critical feature for logistics planning where operators must evaluate hypothetical scenarios that may differ from historical averages. For example, operator-based one-step forecasts can be incorporated into rolling-horizon vehicle routing: by providing fast, link-level speed predictions under alternative boundary conditions, a routing engine can re-evaluate route costs in near real time and trigger dynamic rerouting or vehicle reassignment when predicted travel times exceed operational thresholds. Similarly, in depot scheduling and last-mile dispatch, these forecasts can feed ETA-aware sequencing and feasibility checks so that pickup/drop-off orders are proactively rescheduled to reduce delay propagation and improve on-time delivery rates.

To further investigate the model’s sensitivity to specific boundary conditions, we performed a systematic perturbation analysis. Figure 8 shows the mean predicted speed response to multiplicative scaling of each trunk feature. DeepONet exhibits physically consistent sensitivity, particularly to density and occupancy, whereas the MLP baseline often shows negligible or erratic responses, confirming the operator model’s superior ability to disentangle causal factors [5]. Regarding the sensitivity analysis in Figure 8, the nearly flat response to waiting time warrants closer interpretation. We attribute this to feature redundancy, as density and occupancy already effectively capture the congestion state in this predominantly free-flow scenario, meaning the marginal information provided by waiting time is minimal. Furthermore, while the model demonstrates robust behavior under moderate perturbations, we observed that extreme counterfactual scenarios where zero density is enforced while maintaining low speeds can yield physically inconsistent predictions. This behavior in unseen regimes highlights a limitation of pure data-driven operator learning and underscores the need for incorporating explicit physics-informed constraints in future iterations to ensure validity across the entire state space.

Third, our analysis sheds light on the nature of the traffic modeling challenge. Given that traffic variables such as density, speed, and travel time are highly correlated, we investigated potential multicollinearity issues by comparing DeepONet with Ridge regression, which is robust to multicollinearity via L2 regularization. Ridge regression performed poorly on the simulation dataset, yielding an

R^{2}

of approximately 0.46, but achieved high accuracy on the METR-LA dataset with an

R^{2}

of around 0.90. This stark contrast indicates that the primary challenge in the simulation environment is nonlinearity, specifically the regime shifts between free-flow and congestion, rather than multicollinearity. The superior performance of DeepONet stems from its ability to model these nonlinear operator mappings, which linear models like Ridge cannot capture effectively, regardless of their robustness to collinearity.

However, certain limitations warrant discussion. While DeepONet outperforms MLP and Ridge regression, its training time is higher, though still competitive with LSTM. Conversely, in terms of inference efficiency, DeepONet demonstrates a clear advantage. As shown in Table 4, its inference time of 0.07 s is significantly lower than that of the Transformer at 0.33 s and the GNN at 0.18 s, making it highly suitable for real-time applications where low latency is critical. Additionally, unlike GNNs which explicitly encode the adjacency matrix, DeepONet learns spatial dependencies implicitly through the Trunk network’s conditioning. While this proved effective in our experiments, it may face scalability challenges in extremely large networks where the explicit sparsity of graph convolutions offers a computational advantage. Nevertheless, the results confirm that for typical urban traffic networks, DeepONet provides a versatile and powerful alternative to existing spatiotemporal architectures.

6. Conclusions and Practical Implications

This study presented a Deep Operator Network framework for macroscopic speed forecasting that explicitly links logistics demand to traffic states. By factorizing the learning problem into a branch network for historical dynamics and a trunk network for exogenous boundary conditions, the model achieves robust cross-scenario generalization without retraining, effectively addressing the challenges of distribution shift and data heterogeneity.

6.1. Practical Implications

For logistics operators, this capability enables ’what-if’ analysis of routing strategies under varying congestion regimes. Planners can simulate the impact of different warehouse allocation strategies or delivery schedules on network traffic speeds without deploying physical vehicles. For traffic managers, it provides a data-driven digital twin that adapts to shifting demand patterns, allowing for proactive signal control and congestion management. Figure 7 demonstrates this “Digital Twin” capability, where the model correctly recovers the fundamental diagram of traffic flow from hypothetical inputs, validating its physical consistency [44]. However, before full-scale deployment, rigorous calibration using local data and the integration of uncertainty estimation modules such as conformal prediction are recommended to ensure safety and reliability in critical operations.

6.2. Limitations and Future Work

Despite these promising results, the current model relies on aggregated link-level features and does not explicitly capture network topology via graph convolutions, which may limit performance in large-scale networks with complex propagation effects. Additionally, while we validated the model on both SUMO simulation and METR-LA real-world data, the transfer learning capability between simulation and reality (Sim2Real) remains to be fully explored. Furthermore, the continuous operator nature of DeepONet allows for natural extension to multi-step forecasting by querying the trunk network at future time coordinates (

t + Δ t

), offering a unique advantage over autoregressive iteration by avoiding error accumulation. Future research will focus on integrating Graph Neural Operators to better capture spatial dependencies, incorporating physical constraints via Physics-Informed Neural Networks (PINNs) to enhance interpretability, and developing unsupervised domain adaptation techniques to further bridge the gap between logistics planning simulations and real-time traffic operations. Together, these directions chart a pathway from accurate one-step forecasts to robust, decision-ready tools for real-world logistics networks.

A priority for future work is explicit uncertainty quantification and decision-aware propagation: by estimating predictive distributions, or simple calibrated intervals and feeding them into optimization objectives, operators can apply risk-aware routing that trades expected travel time against tail-risk during demand surges or disruptions. Equally important is Sim2Real adaptation: lightweight domain adaptation or online fine-tuning to local measurement regimes and demand patterns would increase trust in operator forecasts for operational replanning during incidents and peak periods.

Author Contributions

Conceptualization, B.Y.; methodology, D.L. and Y.C.; software, B.Y. and D.L.; validation, B.Y.; formal analysis, B.Y.; investigation, B.Y. and D.L.; resources, B.Y.; data curation, B.Y.; writing—original draft, B.Y.; writing—review and editing, J.B.; visualization, B.Y.; supervision, B.Y.; project administration, B.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the “Contract-Based Graduate Enrollment Quotas Program” of the Korea Industrial Technology Association (KOITA), funded by the Ministry of Science and ICT (MSIT), Republic of Korea. It was also supported by the China Society of Logistics (CSL) research projects: (1) “Path Identification and Strategy for Digital Transformation in Small and Medium-Sized Logistics Enterprises” (Grant No. 2025CSLKT3-083, 2025); and (2) “Operation Workflow of Smart Factory Production Logistics and AGV Path Optimization” (Grant No. 2024CSLKT3-089, 2024).

Data Availability Statement

Simulation scripts and training code are available at [45].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Adam	Adaptive Moment Estimation
ARIMA	AutoRegressive Integrated Moving Average
DeepONet	Deep Operator Network
FFT	Fast Fourier Transform
GNN	Graph Neural Network
GNO	Graph Neural Operator
GPS	Global Positioning System
ITS	Intelligent Transportation Systems
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MLP	Multilayer Perceptron
OD	Origin–Destination
PDE	Partial Differential Equation
PINN	Physics-Informed Neural Network
ReLU	Rectified Linear Unit
RMSE	Root Mean Squared Error
Sim2Real	Simulation to Reality
SUMO	Simulation of Urban MObility
TCN	Temporal Convolutional Network
$R^{2}$	Coefficient of Determination

References

Yang, X.; Yuan, Y.; Liu, Z. Short-Term Traffic Speed Prediction of Urban Road with Multi-Source Data. IEEE Access 2020, 8, 87541–87551. [Google Scholar] [CrossRef]
Yuan, H.; Li, G. A Survey of Traffic Prediction: From Spatio-Temporal Data to Intelligent Transportation. Data Sci. Eng. 2021, 6, 19–38. [Google Scholar] [CrossRef]
Vlahogianni, E.I.; Karlaftis, M.G.; Golias, J.C. Short-Term Traffic Forecasting: Where We Are and Where We’re Going. Transp. Res. Part Emerg. Technol. 2014, 43, 3–19. [Google Scholar] [CrossRef]
Zhao, J.; Zhuo, F.; Sun, Q.; Li, Q.; Hua, Y.; Zhao, J. DSFormer-LRTC: Dynamic Spatial Transformer for Traffic Forecasting With Low-Rank Tensor Compression. IEEE Trans. Intell. Transp. Syst. 2024, 25, 16323–16335. [Google Scholar] [CrossRef]
Li, Z.; Wang, T.; Zou, G.; Wang, R.; Li, Y. Physics-informed deep operator network for traffic state estimation. arXiv 2025, arXiv:2508.12593. [Google Scholar] [CrossRef]
Yang, Z.; Wang, C. Short-Term Traffic Flow Prediction Based on AST-MTL-CNN-GRU. IET Intell. Transp. Syst. 2023, 17, 2205–2220. [Google Scholar] [CrossRef]
Stockem Novo, A.; Hürten, C.; Baumann, R.; Sieberg, P. Self-Evaluation of Automated Vehicles Based on Physics, State-of-the-Art Motion Prediction and User Experience. Sci. Rep. 2023, 13, 12692. [Google Scholar] [CrossRef]
Chatfield, C. Time-Series Forecasting; Chapman and Hall/CRC: Boca Raton, FL, USA, 2000. [Google Scholar]
Harvey, A.C. Forecasting, Structural Time Series Models and the Kalman Filter; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
Krajzewicz, D.; Erdmann, J.; Behrisch, M.; Bieker, L. Recent Development and Applications of SUMO—Simulation of Urban MObility. Int. J. Adv. Syst. Meas. 2012, 5, 128–138. [Google Scholar]
Rojas, R.; Iglesias, J.; Mejia, G. Application of FlexSim in modeling and simulation of logistics processes. Procedia Eng. 2016, 149, 407–411. [Google Scholar]
Jiang, R.; Yin, D.; Wang, Z.; Wang, Y.; Deng, J.; Liu, H.; Cai, Z.; Deng, J.; Song, X.; Shibasaki, R. DL-Traff: Survey and Benchmark of Deep Learning Models for Urban Traffic Prediction. arXiv 2023, arXiv:2108.09091. [Google Scholar]
Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef]
Choi, J.; Choi, H.; Hwang, J.; Park, N. Graph neural controlled differential equations for traffic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; 2022; Volume 36, pp. 6367–6374. [Google Scholar]
Wang, X.; Ma, Y.; Wang, Y.; Jin, W.; Wang, X.; Tang, J.; Jia, C.; Yu, J. Traffic Flow Prediction via Spatial Temporal Graph Neural Network. In Proceedings of The Web Conference 2020 (WWW ’20), Taipei, Taiwan, 20–24 April 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 1082–1092. [Google Scholar]
Geng, Z.; Xu, J.; Wu, R.; Zhao, C.; Wang, J.; Li, Y.; Zhang, C. STGAFormer: Spatial-temporal Gated Attention Transformer based Graph Neural Network for traffic flow forecasting. Inf. Fusion 2024, 105, 102228. [Google Scholar] [CrossRef]
Ahmad, O.; Ramezankhani, M.; Deodhar, A. DETNO: A Diffusion-Enhanced Transformer Neural Operator for Long-Term Traffic Forecasting. arXiv 2025, arXiv:2508.19389. [Google Scholar]
Subbaswamy, A.; Saria, S. Evaluating model robustness and stability to dataset shift. Proc. IEEE 2021, 109, 802–825. [Google Scholar]
Wang, L.; Geng, X.; Ma, X.; Liu, F.; Yang, Q. Cross-City Transfer Learning for Deep Spatio-Temporal Prediction. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; pp. 1893–1899. [Google Scholar]
Carbonneau, R.; Laframboise, K.; Vahidov, R. Application of machine learning techniques for supply chain demand forecasting. Eur. J. Oper. Res. 2008, 184, 1140–1154. [Google Scholar] [CrossRef]
Lu, L.; Jin, P.; Pang, G.; Zhang, Z.; Karniadakis, G.E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat. Mach. Intell. 2021, 3, 218–229. [Google Scholar] [CrossRef]
Kovachki, N.; Li, Z.; Liu, B.; Azizzadenesheli, K.; Bhattacharya, K.; Stuart, A.; Anandkumar, A. Neural Operator: Learning Maps Between Function Spaces. J. Mach. Learn. Res. 2023, 24, 1–97. [Google Scholar]
Wen, G.; Li, Z.; Azizzadenesheli, K.; Anandkumar, A.; Benson, S.M. U-NO: U-shaped neural operators. J. Comput. Phys. 2022, 463, 111288. [Google Scholar]
Goswami, S.; Bora, A.; Yu, Y.; Karniadakis, G.E. Physics-informed deep neural operator networks. arXiv 2022, arXiv:2207.05748. [Google Scholar] [CrossRef]
Chen, T.; Chen, H. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions. IEEE Trans. Neural Netw. 1995, 6, 911–917. [Google Scholar] [CrossRef] [PubMed]
Yuan, Y.; Wang, Q.; Yang, X.T. Traffic flow modeling with gradual physics regularized learning. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14649–14660. [Google Scholar] [CrossRef]
Lyu, K.; Wang, J.; Zhang, Y.; Yu, H. Neural Operators for Adaptive Control of Traffic Flow Models. IFAC-PapersOnLine 2025, 58, 123–128. [Google Scholar] [CrossRef]
Feng, R.; Ma, A.; Jing, Z.; Gu, X.; Dang, P.; Yao, B. Understanding the uncertainty of traffic time prediction impacts on parking lot reservation in logistics centers. Ann. Oper. Res. 2024, 343, 1045–1067. [Google Scholar] [CrossRef]
Ye, J.; Zhao, J.; Ye, K.; Xu, C. How to Build a Graph-Based Deep Learning Architecture in Traffic Flow Prediction: A Survey. IEEE Trans. Intell. Transp. Syst. 2023, 24, 4657–4679. [Google Scholar]
Li, Z.; Kovachki, N.B.; Azizzadenesheli, K.; Liu, B.; Bhattacharya, K.; Stuart, A.M.; Anandkumar, A. Fourier Neural Operator for parametric PDEs. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 4 May 2021. [Google Scholar]
Chowdhury, M.M.H.; Chakraborty, T. Calibration of SUMO Microscopic Simulation for Heterogeneous Traffic Condition: The Case of the City of Khulna, Bangladesh. Transp. Eng. 2024, 18, 100281. [Google Scholar] [CrossRef]
Quiñonero-Candela, J.; Sugiyama, M.; Schwaighofer, A.; Lawrence, N.D. Dataset Shift in Machine Learning; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
Gunawan, A.; Kendall, G.; McCollum, B.; Seow, H.V.; Lee, L.S. Vehicle routing: Review of benchmark datasets. J. Oper. Res. Soc. 2021, 72, 1794–1807. [Google Scholar] [CrossRef]
Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. Statistical and machine learning forecasting methods: Concerns and ways forward. PLoS ONE 2018, 13, e0194889. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice, 3rd ed.; OTexts: Melbourne, Australia, 2021. [Google Scholar]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
Shen, Z.; Yang, H.; Zhang, S. A Survey on Universal Approximation Theorems. arXiv 2024, arXiv:2407.12895. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
Bai, Y.; Luo, J.; Jiang, Y.; Li, Z.; Xia, S.; Chen, H. Understanding and Improving Early Stopping for Learning with Noisy Labels. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6–14 December 2021. [Google Scholar]
Yu, H.; Wang, Y.; Jin, F.; Zhang, M.; Chen, A. A Physics-informed Deep Operator for Real-Time Freeway Traffic State Estimation. arXiv 2025, arXiv:2508.08002. [Google Scholar]
jbnu55343. scm_deeponet: Simulation Scripts and Training Code. 2025. Available online: https://github.com/jbnu55343/scm_deeponet (accessed on 3 October 2025).

Figure 1. Project workflow: from Solomon demand mapping and SUMO simulation, through feature construction and operator-style branch–trunk modeling, to cross-scene evaluation and diagnostics.

Figure 2. Schematic of the DeepONet architecture for traffic forecasting. The Branch network encodes the historical speed sequence (system inertia), while the Trunk network encodes the contemporaneous context (boundary conditions). The final prediction is obtained via the dot product of their respective embeddings, effectively learning the operator that maps history and context to future states.

Figure 3. Robustness analysis of DeepONet vs. baselines. (a) Cross-Scenario Generalization: DeepONet and Transformer maintain stable low error (MAE) when transferring from seen training scenarios (S001–S004) to unseen test scenarios (S005–S006), whereas MLP and GNN performance degrades. (b) Error vs. Traffic Density: As traffic density increases (x-axis), the error of the MLP model grows quadratically, indicating failure in congestion regimes. GNN shows moderate degradation, while DeepONet and Transformer remain robust, validating the effectiveness of operator learning and attention mechanisms in handling varying boundary conditions.

Figure 4. Parity plots comparing predicted vs. actual traffic speeds for (a) MLP, (b) GNN, and (c) DeepONet on the Real-World METR-LA dataset. The red dashed line represents perfect prediction (

y = x

). DeepONet shows significantly tighter clustering around the diagonal compared to MLP and GNN, particularly in the high-speed free-flow regime exceeding 40 km/h, demonstrating its superior capability in handling complex real-world dynamics.

Figure 4. Parity plots comparing predicted vs. actual traffic speeds for (a) MLP, (b) GNN, and (c) DeepONet on the Real-World METR-LA dataset. The red dashed line represents perfect prediction (

y = x

). DeepONet shows significantly tighter clustering around the diagonal compared to MLP and GNN, particularly in the high-speed free-flow regime exceeding 40 km/h, demonstrating its superior capability in handling complex real-world dynamics.

Figure 5. Time-series forecast comparison on the METR-LA dataset (Node 112). The DeepONet shown in blue and Transformer in green accurately track the abrupt speed drops during morning and evening rush hours, whereas the MLP in orange and GNN in purple exhibit significant lag and fail to capture the full depth of the congestion valleys. The red shaded bands indicate the typical morning and evening peak periods (approximately 07:00–09:00 and 16:00–18:00).

Figure 6. Ablation diagnostics. Removing density or travel time causes the largest degradation, confirming the value of exogenous context. Increasing latent width p steadily improves performance without brittleness. (a) Drop-one trunk feature importance(

Δ R^{2}

vs. DeepONet

p = 256

). (b) Capacity sweep for DeepONet (

p \in {64, 128, 256}

).

Figure 6. Ablation diagnostics. Removing density or travel time causes the largest degradation, confirming the value of exogenous context. Increasing latent width p steadily improves performance without brittleness. (a) Drop-one trunk feature importance(

Δ R^{2}

vs. DeepONet

p = 256

). (b) Capacity sweep for DeepONet (

p \in {64, 128, 256}

).

Figure 7. Counterfactual analysis demonstrating the “Digital Twin” capability. We queried the trained DeepONet with hypothetical input functions representing increasing traffic density. The model correctly recovered the fundamental diagram of traffic flow, specifically the inverse relationship between speed and density, without ever being explicitly trained on physics equations, validating its ability to learn the underlying operator.

Figure 8. Zero-retraining counterfactual responses to multiplicative perturbations of trunk features. For each feature

x_{j} \in {occupancy, density, travel time, entered, left, waiting time}

, we evaluate the mean predicted speed after scaling that feature by

(1 + ϵ)

with

ϵ \in {- 0.10, - 0.05, 0, 0.05, 0.10}

, holding all other inputs fixed. DeepONet, shown in blue, and a concatenation MLP, shown in orange, are compared.

Figure 8. Zero-retraining counterfactual responses to multiplicative perturbations of trunk features. For each feature

x_{j} \in {occupancy, density, travel time, entered, left, waiting time}

, we evaluate the mean predicted speed after scaling that feature by

(1 + ϵ)

with

ϵ \in {- 0.10, - 0.05, 0, 0.05, 0.10}

, holding all other inputs fixed. DeepONet, shown in blue, and a concatenation MLP, shown in orange, are compared.

Table 1. Data sources and logistics–traffic linkage.

Layer	Fields	Usage
Demand (orders)	`cust_id`, $(x, y)$ , `qty`, $[t_{min}, t_{max}]$ , `depot_id`	Build OD flows/trips; snap to network; time-bucket by request; define boundary/context for scenes
Supply (depots)	depot coordinates; capacity (if available)	Define sources/sinks; origin assignment for orders
Routes (veh)	`vehroutes.xml`: edge sequences	Path reconstruction; edge utilization; optional node traversal via topology
Edge aggregates	`edgedata.xml`: speed, entered, left, density, occupancy, waiting time, travel time	Main supervised features/targets; per-interval edge-level learning
Vehicle summaries	`tripinfo.xml`: departures/arrivals; delays	Consistency checks; calibration/validation of OD temporal profiles

Table 2. Final training hyperparameters used in the study.

Model	Input Shape	Regularization	Optimizer & LR	Batch	Max Epochs/ES
Persistence (lag1)	18 (uses `lag1` only)	—	—	—	—
Ridge	18	L2 ( $α$ tuned)	closed-form/Limited-memory BFGS (LBFGS)	N/A	N/A
MLP	23	Dropout $0.1$	Adam, $10^{- 3}$	1024	50/patience 10
LSTM	$(12 \times 7)$	Dropout $0.1$	Adam, $10^{- 3}$	1024	50/patience 10
TCN	$(12 \times 7)$	Dropout $0.1$	Adam, $10^{- 3}$	1024	50/patience 10
Transformer	$(12 \times 7)$	Dropout $0.1$	Adam, $10^{- 3}$	128	100/patience 10
GNN	Graph ( $N \times F$ )	Dropout $0.3$	Adam, $10^{- 3}$	64	100/patience 10
DeepONet	Branch: 12; Trunk: 6	Dropout $0.1$	Adam, $10^{- 3}$	1024	50/patience 10

Table 3. Architectural choices used for each model.

Model	Layers/Blocks	Hidden Sizes	Embedding Dim p	Horizon
MLP	3 hidden layers	[256,128,64]	—	1 min
LSTM	2 LSTM layers	hidden = 64	—	1 min
TCN	3 conv blocks	channels = [32,32,32]	—	1 min
Transformer	2 encoder layers	$d_{model} = 64$	—	1 min
GNN	2 GCN-style ops	hidden = 64	—	1 min
DeepONet	Branch: 3 FC; Trunk: 3 FC	$[256, 256] \to p$	$p = 128$ (default)	1 min

Table 4. Comparative Performance of Deep Learning Models across Experimental Modules. Best results in bold.

Module	Dataset	Model	$R^{2}$ Score	MAE	RMSE	Train Time (s)	Inf Time (s)
1. SUMO Baseline	Simulation (Linear)	Persistence	−1.0749	9.66	13.75	0.0	0.0
(No Spatial)	19 Features	Ridge	0.4631	5.63	6.99	0.1	-
	1.2 M Samples	MLP	0.7975	2.86	4.30	178.4	-
		0.7975	2.86	4.30	178.4	-
		TCN	0.7905	2.97	4.37	366.2	-
		LSTM	0.8188	2.59	4.06	603.2	-
		Transformer	0.8152	2.61	4.10	2174.5	-
		DeepONet	0.8122	2.66	4.14	362.1	-
2. SUMO Spatial	Simulation (Linear)	Persistence	−2.8173	11.52	15.26	0.0	0.0
(With Spatial)	23 Features	Ridge	0.3224	5.23	6.43	0.1	-
	1.2 M Samples	MLP	0.7031	2.97	4.26	272.0	-
		TCN	0.6974	2.99	4.30	378.1	-
		LSTM	0.7483	2.60	3.92	522.2	-
		GNN (Local)	0.7166	2.81	4.16	428.0	-
		Transformer	0.7310	2.70	4.05	2958.4	-
		DeepONet	0.7473	2.66	3.93	396.6	-
3. Real-World	METR-LA (Graph)	Persistence	0.3590	7.61	17.68	0.00	0.00
(Complex)	207 Nodes	Ridge	0.9044	3.50	6.83	2.11	0.03
	34 k Samples	MLP	0.8791	4.23	7.68	4.50	0.14
		TCN	0.8949	4.10	7.16	123.74	0.39
		LSTM	0.7704	6.79	10.58	5.87	0.08
		GNN (GCN)	0.8952	4.56	7.15	96.00	0.18
		Transformer	0.9137	2.74	6.49	73.00	0.33
		DeepONet	0.9172	2.55	6.35	92.00	0.07

Table 5. Ablations on input configuration and architecture.Best-performing configuration for each metric is highlighted in bold.

Configuration	MAE	$Δ$	RMSE	$Δ$	$R^{2}$	$Δ$
DeepONet (Branch-only; 12 lags)	12.779	+11.536	12.959	+10.927	−7.4611	−8.4547
DeepONet-occupancy	1.393	+0.150	2.447	+0.415	0.9828	−0.0108
DeepONet-density	20.193	+18.950	38.480	+36.448	−3.2520	−4.2456
DeepONet-travel time	3.065	+1.822	3.157	+1.125	0.4979	−0.4957
DeepONet-entered	3.782	+2.539	4.974	+2.942	0.9289	−0.0647
DeepONet-left	3.112	+1.869	4.423	+2.391	0.9438	−0.0498
DeepONet-waiting time	1.884	+0.641	2.671	+0.639	0.9795	−0.0141
DeepONet (p = 64)	1.310	+0.067	2.140	+0.108	0.9478	−0.0458
DeepONet (p = 128)	1.392	+0.149	2.127	+0.095	0.9484	−0.0452
DeepONet (p = 256)	1.243	+0.000	2.032	+0.000	0.9936	+0.0000
Concat-MLP (18-d)	1.430	+0.187	2.243	+0.211	0.9856	−0.0080

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, B.; Chen, Y.; Luo, D.; Bae, J. Operator Learning with Branch–Trunk Factorization for Macroscopic Short-Term Speed Forecasting. Data 2025, 10, 207. https://doi.org/10.3390/data10120207

AMA Style

Yu B, Chen Y, Luo D, Bae J. Operator Learning with Branch–Trunk Factorization for Macroscopic Short-Term Speed Forecasting. Data. 2025; 10(12):207. https://doi.org/10.3390/data10120207

Chicago/Turabian Style

Yu, Bin, Yong Chen, Dawei Luo, and Joonsoo Bae. 2025. "Operator Learning with Branch–Trunk Factorization for Macroscopic Short-Term Speed Forecasting" Data 10, no. 12: 207. https://doi.org/10.3390/data10120207

APA Style

Yu, B., Chen, Y., Luo, D., & Bae, J. (2025). Operator Learning with Branch–Trunk Factorization for Macroscopic Short-Term Speed Forecasting. Data, 10(12), 207. https://doi.org/10.3390/data10120207

Article Menu

Operator Learning with Branch–Trunk Factorization for Macroscopic Short-Term Speed Forecasting

Abstract

1. Introduction

2. Related Work

2.1. Application of Deep Learning Method in Macroscopic Short-Term Speed Forecasting

2.2. Operator Learning in Scientific Machine Learning

2.3. Physical Interpretation of the Architecture

2.4. Comparison with Classical, Geometric, and Operator Learning

3. Background and Problem Formulation

Motivation and Data Infrastructure for Macroscopic Short-Term Speed Forecasting

4. Methodology

4.1. Solomon Dataset as the Demand Prior

4.2. Simulation Environment and Dataset Construction

4.3. Real-World Dataset

4.4. Baseline and Comparative Models

4.5. Operator-Learning Model

4.6. Evaluation

5. Experimental Results and Discussion

5.1. Overall Performance Comparison

5.2. Baseline Simulation Experiments

Implementation Details

5.3. Spatial Feature Analysis

5.4. Real-World Validation

5.5. Ablation Study

5.6. Discussion

6. Conclusions and Practical Implications

6.1. Practical Implications

6.2. Limitations and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI