Predicting the Aggregate Mobility of a Vehicle Fleet within a City Graph

: Predicting vehicle mobility is crucial in domains such as ride-hailing, where the balance between offer and demand is paramount. Since city road networks can be easily represented as graphs, recent works have exploited graph neural networks (GNNs) to produce more accurate predictions on real traffic data. However, a better understanding of the characteristics and limitations of this approach is needed. In this work, we compare several GNN aggregated mobility prediction schemes to a selection of other approaches in a very restricted and controlled simulation scenario. The city graph employed represents roads as directed edges and road intersections as nodes. Individual vehicle mobility is modeled as transitions between nodes in the graph. A time series of aggregated mobility is computed by counting vehicles in each node at any given time. Three main approaches are employed to construct the aggregated mobility predictors. First, the behavior of the moving individuals is assumed to follow a Markov chain (MC) model whose transition matrix is inferred via a least squares estimation procedure; the recurrent application of this MC provides the aggregated mobility prediction values. Second, a multilayer perceptron (MLP) is trained so that—given the node occupation at a given time—it can recursively provide predictions for the next values of the time series. Third, we train a GNN (according to the city graph) with the time series data via a supervised learning formulation that computes—through an embedding construction for each node in the graph—the aggregated mobility predictions. Some mobility patterns are simulated in the city to generate different time series for testing purposes. The proposed schemes are comparatively assessed compared to different baseline prediction procedures. The comparison illustrates several limitations of the GNN approaches in the selected scenario and uncovers future lines of investigation.

Depending on the application and domain, the prediction can be done based on the data of each element (disaggregated data) or using aggregated values for a given region or way.Although disaggregated data can potentially lead to more accurate results, personal mobility data may not always be available Storing this type of information long-term or sharing it with third parties may lead to privacy and legal issues.Thus, this work focuses on directly applicable and privacy-preserving models on aggregated data.
In recent years, many real-world datasets have been published, and they have become the norm in GNN-based and broader ML-based traffic forecasting studies [7].Evaluating with real data eliminates any potential questions about the validity of the data and the applicability of the model in real scenarios.However, traffic simulations have two potential benefits.Firstly, simulations have the ability of modeling unseen graphs, such as a planned road topology [7].Secondly, evaluating predictive models on synthetic and extreme scenarios can help to better understand them and discover their limitations [8].
Taking into account that asset (vehicle) mobility in a city can be formalized using the graph that defines the streets (edges), intersections (nodes), and directions (associated with edges), in this work we address the prediction of aggregated asset distribution along the nodes of this graph, based only on past aggregated distribution history.Aggregated patterns are a result of individual asset behavior, which in real-life scenarios can be very complex.This in turn hinders the ability to reason about the emergent behavior or to construct extreme scenarios that test the limits of a given model.Hence, this paper focuses on a very simple mobility model based on Markov Chains (MC).This model is parameterized by a transition matrix, which can be easily tuned.Then, we propose three prediction schemes: a method based on a Markov Chain (MC) model [9], and two methods based on Multilayer Perceptrons (MLP) [10] and graph neural networks (GNNs) [11].
The rest of this paper is organized as follows.Section 2 presents relevant works in the areas of mobility models, mobility prediction, and graph neural networks (GNNs).In Section 3, we formalize the problem as a time series prediction task.A first prediction scheme based on a Markov Chain model is developed in Section 4. In Section 5 two alternative neural network prediction schemes are designed: one based on a multilayer perceptron (MLP) and the other on graph neural networks (GNNs).Simulations performed to illustrate the applicability of the proposed schemes are presented in Section 6. Concluding remarks and future lines of work to be addressed are stated in Section 7.

State-of-the-Art Literature Review
As mentioned above, existing works are framed in the context of available individual (i.e., disaggregated) mobility data.In [1], to develop a communication infrastructure for connected and automated vehicles (CAVs), a clustering algorithm is proposed, whose evaluation is carried out via simulations that make use of a Gauss-Markov mobility (GMM) model [12,13] as a complementary tool for estimating future locations of single vehicles.In [2], the mobility prediction for individual vehicles is addressed within a grid partition of the city's geographical space, and a second-order Markov model is assumed to characterize each driver's mobility preferences within the grid.The employed hybrid architecture that combines a convolutional neural network (CNN) and a recurrent neural network (RNN) makes use of disaggregated historical grid occupation values, which are available for each individual vehicle; the resulting scheme improves the quality of vehicle mobility prediction, especially for those vehicles that have a strong individual mobility preference.In [3], the same authors consider a similar scenario to address individual mobility prediction to support intelligent vehicle applications; they propose a deep RNN-based algorithm whose results match the theoretical analysis and improve state-of-the-art previous results.In [5], a deep RNN making use of a long short-term memory (LSTM) architecture is proposed to predict individual vehicle mobility on a grid, based on individual sensor data; the prediction step is followed by a vehicle recruiting algorithm to optimize crowdsensing; the results improve the quantity of collected sensing data versus existing algorithms.In [4], a model based on a general-purpose sequence prediction model, named the compact prediction tree (CPT), is proposed for driver route prediction; the results show an efficient noise tolerance of the proposed algorithm.The prediction of individual vehicle trajectories, based on consecutive previous GPS locations combined with a statistical inference module for online mobility prediction, is proposed in [6].This inference module is based on a hidden Markov model (HMM), and the prediction stage employs an improved version of the Viterbi algorithm.Such improvements reduce computational costs.In [14], the authors present a hierarchical trajectory prediction structure using a capsule neural network (CapsNet) that embeds local temporal and global spatial correlations through hierarchical capsules represented on a grid map.This procedure outperforms state-of-the-art methods.
So far, all the existing proposals address individual vehicle prediction making use of disaggregated available data.The restriction of only having access to aggregate data is one of the novel contributions of the work presented here.

Problem Statement
This paper addresses the comparative assessment of different schemes for the prediction of the aggregated mobility of a fleet of vehicles that move along a graph describing a city.In this type of graph, the nodes represent crossroads in the city, and the directed edges represent the roads (and their directions) between the crossings.Simple simulationcontrolled scenarios will be considered for focusing on the main objective of the comparative assessment.
The aggregated mobility will be defined by the distribution of the assets along the graph nodes represented by a vector y t ∈ (N ∪ {0}) N for a set of discrete time stamps t = 0, 1, . . .Assuming available a sequence of occupation values y t , t = 0, 1, . . ., T − 1, the objective is to predict the next value y T .
Three approaches are considered to perform such predictions.On the one hand, motivated by the underlying graph structure, a Markov chain (MC) model, defined by a transition matrix Π, is assumed to model the mobility of each vehicle; then, such a matrix (denoted as Π) is estimated using past mobility data y t , t = 0, 1, . . ., T − 1.Ultimately, the prediction is performed using Π: ( The second approach considers an MLP architecture trained in a supervised manner using the past mobility data y t , t = 0, 1, . . ., T − 1, so that: where Θ is the set of adjustable parameters and d ≥ 1 defines the size of the time window of previously selected values as predictors. Finally, the third approach proposes a GNN that takes advantage of both the underlying graph structure and the proven effectiveness of graph neural networks (GNNs) for time series prediction [15].Such GNN is also trained in a supervised manner with the past mobility data y t , t = 0, 1, . . ., T − 1, so that in general: y p T = GNN(y 0 , . . ., y T−1 ). ( The details of the MLP and GNN implementations and the use of the available data are provided in Section 5.2.
The three proposed approaches are comparatively assessed together with other simpler prediction schemes employed as baselines.

A Markov Chain Model
This aggregate model assumes that the mobility of each individual vehicle in the graph can be modeled via a Markov chain, defined by the transition probabilities between the connected nodes.We formalize this assumption in the remainder of this section.

Markov Chains and Transition Matrices
Let {X t } t=0,1,...,T be a Markov Chain (MC) [9,16], which represents a discrete-time stochastic process with finite state space X t ∈ S = {s 1 , . . ., s N }, whose associated probability distribution satisfies the Markov property: P(X t+1 =s j | X t =s i , X t−1 =s k , . . ., X 0 =s l ) = P(X t+1 =s j | X t =s i ). ( Let Π be the Markov chain transition matrix so that given two states s i , s j , the term (j, i) of such matrix is denoted as Π ji = Π(s j ← s i ) = P(X t+1 =s j | X t =s i ) and represents the transition probability from state i to state j.Defined this way, the columns of Π are stochastic vectors so that ∀i = 1, . . ., N we have For each t, let us denote P t as the (discrete) probability distribution corresponding to X t .P t is defined as a stochastic vector Note that P 0 corresponds to the distribution of the initial condition or initial state X 0 ; in case such an initial state is known to take a deterministic value X 0 = s j , for some j ∈ {1, . . ., N}, then we have that P 0 = δ j = [δ 1,j , . . ., δ j,j , . . ., δ N,j ] meaning that p (j) According to (4), the distributions P t , t = 0, . . ., T − 1 for T ∈ N, T > 1 satisfy the condition Note that (5) defines a dynamical system that-starting from P 0 -allows to recursively compute for each time, t, the probability distribution, P t , corresponding to X t .
According to (4), {X t }, t ∈ {0, . . ., T} are not independent variables, but their joint distribution can be characterized stepwise making use of the corresponding conditional distributions.Hence, we can take a sample of the Markov chain by relying on such conditional distributions, which are completely defined with P 0 and matrix Π.

Binary Vector Codification of States
Let us code any state s j via column vector δ j = [δ j1 , . . ., δ jj , . . ., δ jN ] ⊤ ∈ {0, 1} N (where ⊤ denotes the transpose of a vector/matrix) whose elements are all zero but the j-th one.Then, considering this codification, we have that X t ∈ {0, 1} N so that (with some abuse of notation) E[X t ] = P t , the distribution of X t (when considering X t ∈ S).Furthermore, the distribution of X t+1 |X t =s j , can be computed as

Model for Aggregated Data
Let us consider that the data derived from M realizations of the Markov chain {x (m) t } m=1,...,M t=0,...,T is only available in an aggregated form so that for each t = 0, . . ., T only a sample of the random variable (using the binary vector notation): is available.Note that which will allow to design of some model estimation procedures.

Estimation of Transition Matrix with Aggregated Data
Let us consider that we only have available a sample of Y t defined in (6).Then, following (7) we can construct the following: an estimate of P t using only y t , the single sample of Y t .Hence, according to (5), we can expect Π to satisfy which leads to the following least squares estimation (LSE) scheme [17] for estimating Π: where ]; then the application of this LSE estimator given by ( 10) to z as a new time series, would lead to the known MLE estimator of Π for disaggregated data (see Appendix A.) Concerning the size of the parameter space of this MC model, the parameters to be estimated (i.e., learned) are the coefficients of the corresponding transition matrix Π, whose size amounts to N 2 .Problem ( 10) is a quadratic optimization problem with positivity and linear constraints.This type of problem is denoted as Quadratic programming or more specifically as a Constrained least squares problem.The canonical quadratic form for (10) and the corresponding Karush-Kuhn-Tucker's necessary conditions are derived in Appendix B.
Remark 1.The LSE estimate of Π proposed in (10) has been selected among other possible estimates (based on alternative distance measures between vectors), because the quadratic cost in (10) results in an efficiently solvable optimization problem.For instance, alternative costs could have been defined by making use of other norms ∥y t+1 − Π • y t ∥ q ; furthermore, considering that y t+1 and Π • y t can be normalized according to (8) to become a discrete distribution, some cost involving the Kullback-Leibler divergences KL( M ) could also be a reasonable option.

Two Approaches Using Neural Networks
In this section, two schemes that make use of Neural Network (NN) architectures are presented.Neural network approaches have proven to be a very efficient solution in supervised learning scenarios with high uncertainty.The first approach directly employs an MLP-based recurrent formulation that provides a natural nonlinear learning approach to improve standard time series prediction schemes; the second approach makes use of a GNN that the capabilities of an NN while making use of the available knowledge, in the form of a network structure, of the city mobility paths.

Recursive Application of a Multilayer Perceptron
This standard approach considers an MLP structure that generalizes the nonlinear formulations of the usual recurrent schemes for time series prediction.The fundamental formulation would take the form of (2) where the set of adjustable parameters Θ is determined via a training scheme.In general, such training of the MLP (i.e., estimation of Θ) should be appropriately performed to control bias and overfitting of the resulting model; in any case, it is interesting to note the parallelism between the following basic training scheme and the procedure for estimating Π via (10).Note that a priori the computation of Θ via (11) would not exploit the available knowledge about the city graph structure as (10) does.Our evaluation has focused on an MLP with a configurable number of layers (l), where the first layer contains an input per pair of nodes (N) and feature (d), and the next layers are fully connected and of size N.The learnable parameters in this model are all the weights and biases in the network, which in this case is O(N 2 × l).The specific number can be calculated as

Application of Graph Neural Networks
In this section, we assess the use of graph neural networks (GNNs) for the prediction problem defined in Section 3. GNNs are specially designed to process and learn from graph-structured data, the main applications being node classification, link prediction, and graph classification.
To address the prediction of the number of assets associated with each node, a node representation or embedding is constructed by the GNN.To perform this task, the GNN gathers and aggregates information from each node's neighbors, based on the connections within the graph.Such information is passed and updated between nodes and their neighbors through several layers of computation.Each layer typically involves two main steps: message passing, where information from neighboring nodes is aggregated, and node update, where the aggregated information is used to update the node representation.
The application of GNNs in this paper is grounded on the assumption that the learned representation can capture complex relationships and dependencies inherent in the graph that may help for the prediction of the number of assets in each node.
The main GNN architecture used in this paper is based on graph convolutional networks (GCNs) [18,19], which is formalized as follows.Let us denote f (i) the feature vector that gathers the employed information associated with the i-th node of the network.Specifically, f (i) will be composed by a concatenation of the following elements: where F = {T − W, . . ., T − 1} ⊂ {0, . . ., T − 1} and y i F = y i T−W , . . ., y i T−1 is the vector of previous asset-occupation values at node i that are provided to the GNN, i id is a (one-hot) encoding of the node i identification, and T is the time instant where the prediction is carried out.Note that W, the number of samples used as node features, is a parameter of the model.Hence, if we accordingly denote (y p T ) i to represent the i-th component of y p T (i.e., the predicted occupation of such i-th node at time T), the node-wise input-output formulation of the GNN for our problem is as follows: where AF defines a tunable affine transformation and MLP is a Multilayer Perceptron, their adjustable parameters being denoted by Θ 1 and Θ 2 , respectively.The two sets of parameters are characterized by two hyper-parameters: the number of hidden layers and the number of neurons in each layer.N (i) is the neighborhood of node i and di = 1 + ∑ j∈N (i) e j,i , where e i,j denotes the edge weight between nodes i and j.In our basic formulation, all edge weights are set to 1.
The first step carries out a weighted feature aggregation of node i neighborhood nodes according to the topology and weights of the network.Then, the AF Θ 1 processes, via a tunable affine transformation, the aggregated feature vector gathering the previous asset-occupation values, node id and instant of time, in order to provide the embedding h i .Then, the MLP Θ 2 processes a concatenation of the AF output h i with the node feature vector f i , to provide the prediction value (y p T ) i .The weights in the model described are trained through a stochastic gradient descent strategy on the whole set of nodes on a sufficiently large time series dataset (T train ).
The error in each epoch is computed by averaging the mean squared error over a randomly selected set T i ⊂ T train .
To complement this approach, we have also included an alternative that replaces the computation of h for each node with a Graph Attention Network (GAT) [20].In this type of model, the weight for each neighbor is not fixed and will be learned in the training process.
In contrast with the previous model, the number of learnable parameters in the network does not depend on the number of nodes in the graph but on the dimension of the MLP, determined by Θ 1 and Θ 2 .The number of parameters for Θ 1 is (| f | + 1)×|h|, where | f | is the number of input features and |h| is the dimensionality of h i , which has been set to 8 in our experiments.The number of parameters for where M is the number of neurons at every hidden layer, and l is the number of hidden layers.In this paper, we have chosen to limit our model to two hidden layers, of the same dimensionality as the h vector.Time complexity at prediction time is thus O(E + V × ((|h|+| f |)l × (M 2 ))).In addition to the learned parameters, the model needs to have access to the graph to generate predictions.Using a sparse representation for the adjacency matrix, the memory requirement is O(|E|) [18]

Evaluation
In order to evaluate the proposed approaches, a set of baseline prediction models was constructed for comparison purposes.The comparative evaluation was carried out on a set of synthetic time series of aggregated asset mobility.In the following section, we describe the baseline prediction models, the construction of the testing time series, and the metrics used for the evaluation.

Estimation Schemes
The following two simple prediction procedures are employed as the reference baselines: 1.
A stationary predictor using the previous value in the time series: which is equivalent to assuming an MC model with Π = I N (i.e., to assign probability 1 to the self-links in the graph).

2.
An MC predictor assuming a transition matrix Π, whose random column vectors follow a uniform distribution over the non-zero elements of the adjacency matrix of the graph:

3.
A predictor that is based only on the second step of the GNN approach (Equation ( 14)) without calculating h.This has been labeled NoGNN in the evaluation.The motivation for this predictor is to evaluate the effect of message passing in the prediction error for the selected architecture.
Alternatively, the three more elaborated approaches proposed in this paper are as follows:
Considering the model used in the generation of the evaluation datasets detailed in Section 6.2, this predictor should achieve very high accuracy given enough training data, as its underlying Markovian model is the same as that used for generating the datasets in Section 6.2.

5.
A set of predictors using an MLP architecture to be trained in the spirit of ( 11) (with a proper use of training and validation sets as explained below), and to be employed according to (2). 6.
A set of predictors based on a GNN according to (3), where several parameters can be adjusted.First, different number of characteristics W, whether the one-hot representation of the node id has been used (using the Id suffix in the model name) and whether the number of time steps is included in the feature vector (using the T suffix in the model name).For the sake of simplicity, our evaluation uses the same hyperparameters for both MLP Θ 1 and MLP Θ 2 : two hidden layers to allow for more expressiveness and two possible configurations for the number of neurons in each layer (64 and N) to evaluate the effect of dimensionality in both training and prediction error.
Note that predictors 1, 2 and 4 rely on models based on homogeneous (i.e., timeinvariant) MCs.Alternatively, predictors 5 and 6 allow modeling time-varying behaviors in some of its configurations.

Evaluation Datasets
In general, aggregated mobility data can be generated by simulating individual assets, each with its own individual mobility patterns, and calculating the aggregated occupation of each node at the desired time interval.This enables quite sophisticated behaviors that might take into account specific objectives (and the corresponding trajectories) for each driver.In this paper, we assume a common behavior on all the assets, defined by an MC.Note that this mobility behavior fits precisely the assumption of the scheme described in (1).This deliberate choice facilitates a more rigorous assessment of all other prediction schemes.Furthermore, this mobility model allows the application of a Monte Carlo simulation procedure that ends up providing an aggregated mobility behavior that suits our goal of aggregate mobility prediction.The graph used in the simulations corresponds to the center of the city of Madrid, which is characterized by a graph with N = 2000 nodes.The result of each simulation is a time series of aggregated occupations in each node.
Therefore, each time series is fully determined by the shared mobility model of the assets and the number of assets (M).For each of these two factors, multiple time series of sufficiently large size (T D = 1000) were constructed, where each time series represents the aggregation of M samples generated following the corresponding model.
The different employed (possibly non-homogeneous) MC models are characterized by the corresponding transition probability matrices, which must be compatible with the adjacency matrix of the city graph.In particular, the designed models were: The column-wise uniform transition matrix Π (which favors the performance of predictor ( 16)).(b) A randomly generated stochastic transition matrix Π r .

Training and Testing Strategies
Among the predictors proposed in Section 6.1, predictor 4 based on computing Π (an estimator of the transition matrix Π), predictor 5 employing an MLP, and predictor 6 using a GNN make use of T previously available values of the time series to construct a cost function.Predictor 4 uses the values of the time series to compute Π according to (10), whereas the other two use them (as T train ) to train the weights of either the MLP or the GNN.
Hence, for every time series available of size T D , S different sub-series of size T were constructed by randomly selecting S possible sliding windows from within the full interval of size T D .These sub-series can be characterized as follows: Each of these sub-series is used to train each model (T train = T a i ,b i ), and the results are evaluated by producing predictions of type y p t b j +1 based on T test = T a j ,b j , for any j ̸ = i, which are then compared to the true value y test = y t b j +1 .After extensive experimentation, the values S = 10 and T D = 300 were selected for the final evaluation.

Evaluation Metrics
As far as the definition of errors is concerned, different measures can be defined in line with the comments included in the Remark at the end of Section 4.4.On the one hand, the ∥ • ∥ 2 norm was considered following the quadratic cost imposed in the estimation and training procedures; on the other hand, alternative measures not aligned with the quadratic cost, such as the ∥ • ∥ 1 norm, were also considered.
Alternatively, we can bypass the stochasticity inherent in the Markov model (reflected in each different Monte Carlo implementation), by comparing the performance of the predictors with the optimal prediction (as an expected value) that could be calculated assuming available the true transition matrix Π at T − 1.This ideal predictor serves to define a performance upper bound (i.e., a prediction error lower bound).Such unavoidable lower bound error, which comes from the intrinsic randomness of vehicle mobility (inherent in any Markovian model), can be removed for comparative purposes.

Results
The results of the evaluation are summarized in Table 1.A more illustrative representation is included in Figure 1, where the distribution of the predictive error for each model and each set of evaluation parameters can be seen.Baseline predictor 1, denoted as I N , and baseline predictor 2, denoted Π, are defined, respectively, in Equations ( 15) and (16).Each column corresponds to a different type of model employed to generate the evaluation time series; as explained above, each one of those time series is constructed using a corresponding transition matrix: generator (a) with Π and generator (a) with Π r .
It is important to note that the time series generation schemes (a) and (b) proposed in Section 6.2 make use of an MC-type formulation.Hence, predictor 4 in Section 6.1, based on an MC assumption, performs especially well in such favorable scenarios.The MLP-based predictor 5, even though it can capture Markovian behaviors, cannot compete with the MC optimal predictor in these scenarios since it does not exploit the available graph knowledge.The GNN-based predictor 6, when incorporating the node id and an attention mechanism, performs better than the proposed baseline schemes 1 and 2, and its performance is very close to that of the MC-assumption-based predictor 4.
In all scenarios, GNN models outperform their NoGNN counterparts, which demonstrates the effectiveness of the GNN approach.The simpler MLP variants show higher prediction errors than any of the GNN models.This is especially true of the MLP with more hidden neurons.These results can be explained because MLP models do not incorporate any a priori knowledge about graph structure.Although nodes that are not connected in the graph should not affect each other, the layers in the MLP are fully connected.All weights, including those that are zero, need to be learned from training data However, the chosen values for T S may not be enough, especially when the number of tunable weights is very high.
As for other factors to take into account, it is worth mentioning that low asset counts, which imply low node occupancy (i.e., the average occupancy of each node close to zero), lead to higher errors in general; therefore, in that scenario, all predictors behave almost indistinguishably, due to the stochasticity of time series generative models.More specifically, it is worth noting the relatively high error of the uniform predictor in datasets generated using a column-wise uniform transition matrix (Π).This is due to the stochastic nature of the generative models explained above (see Section 6.2).

Concluding Remarks and Future Work
The motivation behind this paper is to study the capabilities and limitations of GNN models in the prediction of aggregate mobility of a vehicle fleet in a city using only historical aggregate mobility information.An evaluation scenario is proposed, where asset mobility is governed by a Markovian model that is formalized and described in detail.Three schemes are proposed and evaluated.Two of these approaches exploit the underlying graph structure used to define vehicle mobility in a city.One of them further exploits the assumption of an underlying Markov model.
The first method is based on the estimation of the transition probability matrix of a Markov model that is supposed to characterize the mobility behavior of each individual.This estimation-based method shows good performance in scenarios where the mobility time series are generated using MC models, such as the ones used in the paper; in these cases, the Markov-estimation-based method outperforms other prediction schemes.
The second proposed method employs an MLP in a recurrent formulation to generate predictions for the next node occupation as a function of previous time series values.Even though this approach can capture Markovian behaviors, its performance is not competitive since it does not exploit the available graph knowledge.
The third proposed prediction method is based on a graph neural network trained in a supervised manner.In general, this method performs much better than the proposed baseline prediction schemes, and in some configurations outperforms or competes with the first proposed method (GNN).To do so, the GNN model needs enough representation power to discriminate between different nodes and neighbors.Our results show that, in the types of scenarios proposed, this can be achieved by using a higher value of characteristics W, or by including node identifiers in the feature vector.It should be noted that for the GNN models to achieve competitive performance, node identifiers need to be provided as part of the node feature vector.This is true even in scenarios with a uniform transition matrix.
Our study highlights the effectiveness of the Markov model, partly due to our reliance on Markov processes for synthetic data generation.However, the current synthetic data approach may not reflect some of the intricate time series patterns inherent in complex traffic data.Urban traffic dynamics inherently exhibit repetitive patterns and deviations, since humans very often tend to follow pre-established routes, and/or variations of these.GNNs may excel at modeling complex time series patterns, but the absence of such nuances in our synthetic data generation suggests untapped potential in our GNN approach.Future research should incorporate synthetic models tuned with real-world traffic data.By capturing the complexities of urban movement patterns, we can fully leverage GNNs in traffic analysis.In addition, more information could be incorporated into both Markov-based and GNN models to address the problem of aggregate prediction in these new scenarios.Specifically, the GNN architecture presented in this paper could be improved in three different ways: by using different neighbor sampling strategies, by using a more nuanced aggregation function that better discriminates both temporally and by node, and by using more expressive node identification features.

Figure 1 .
Figure 1.Errors for each model and type of dataset.

Table 1 .
Table of mean errors and their standard deviation for each evaluation dataset.Baseline models are included first.The lowest error (up to the second decimal place) for each row is highlighted.