2. Materials and Methods
The methodology used for the prediction of the car traffic has two components: (i) construction of a Voronoi Neighborhood Weighted Graph (VN-WG) capturing spatial relations between sensors, and (ii) a spatiotemporal neural network combining graph convolution with recurrent modeling. Spatial embeddings are obtained with GraphSAGE, temporal dependencies are modeled with an LSTM layer, and the final prediction is made with a fully connected layer. The following subsections present the GraphSAGE formulation, the VN-WG construction, and the hybrid SAGE-Voronoi model.
2.1. Sample and Aggregate Network Layer
The sample and aggregate method (GraphSAGE), as described in [
33,
34] is a reliable and popular method for inductive node embedding. It incorporates node features into the learning algorithm and can learn the topological structure of each node’s neighborhood. It can be defined in the following way:
where:
F—input features (tensor of observations),
W—weight tensor (trainable parameters),
A—adjacency matrix of graph
G,
n—number of vertices in graph
G,
m—input sequence length (number of samples in time series),
—input features count (number of features per vertex),
—output features of SAGE count, and
—permutation invariant pooling operator (often maximum, mean or sum). ReLU (Rectified Linear Unit) introduces nonlinearity to the solution [
35].
The SAGE layer produces low-dimensional tensor representations for all graph nodes in the form of a tensor with dimensionality
, which is a concatenation of the features tensor
F multiplied by the weight tensor
W and the features tensor from the specific neighborhood of each node aggregated by permutation invariant pooling
multiplied by the same weight tensor
W. Next, the embedding is propagated to a temporal modeling layer, such as an LSTM or a Gated Recurrent Unit (GRU). A fully connected layer forms the final prediction. Various approaches can be used to design a node’s neighborhood in the graph. Assuming that we are dealing with real-world spatiotemporal data, the most intuitive approach is either to utilize the known topology of the graph with a distance-based threshold or, if the graph is unknown, model the graph structure only by distances between nodes, as in [
36]:
where
is a coefficient in the adjacency matrix between nodes indexed
a and
b, and
and
are domain-specific parameters that depend on the real-world distances
d. As it can be challenging to rationally estimate the slope of (
2) that is guided by
, the simplified approach is often used:
which produces a binary adjacency matrix. The binary adjacency matrix also simplifies the
operator in (
1) because it does not have to take edge weight into account:
where
is a permutation invariant pooling function (see explanation under (
1)).
However, in that approach, we lose some information about graph topology. We can convey richer information about the neighborhoods of vertices in a graph by using the approach we will discuss in the following sections.
2.2. Voronoi Neighborhood Weighted Graph
Let us recall the definition of a Voronoi diagram, which will be needed later.
Definition 1 (Voronoi diagram [
23]).
Let S be the set of points in the plane. For two distinct points the dominance of over is defined as the subset of the plane being at least close to as to : where is Euclidean distance; we will cover only in this paper.The cell of a point is the portion of the plane lying in all of the dominance over the remaining points in S. While considering cells (6) for each each , they create partition of the plane which is called the Voronoi diagram. Once we have established the terminology used in the definition of a Voronoi diagram, we can define a Voronoi Neighborhood Weighted Graph.
Definition 2 (Voronoi neighborhood weighted graph).
Let be the set of points in the plane. Define the Voronoi adjacency graph , where the set of nodes and the edge belong to if the corresponding Voronoi cells share a boundary (those cells are adjacent to each other). For a parameter , the Voronoi Neighborhood Graph is defined as withwhere is the length of the shortest path in between and . A weighted version is obtained by assigning to each edge a weight derived from using one of the scaling rules (7)–(9). In the next section we will present a recursive algorithm that can be used for Voronoi neighborhood graph calculation.
2.3. Voronoi Neighborhood Graph Calculation
Let us assume that we are registering data at a finite number of points with known coordinates, no pair of points have identical coordinates, and that the data are time series. Also, let us assume that we anticipate the influence of spatial relations between nodes on time series values, and that the strength of the influence is positively correlated with the proximity between points. The intuitive approach to modeling the spatial relationship between these points is to represent them as a graph derived from the Voronoi diagram. The nodes of this graph are the sensor locations; an edge connects two points if their Voronoi cells share a boundary.
The Voronoi neighborhood graph G is derived from and has an additional parameter —maximal neighborhood size. G consists of all the nodes from . Two nodes of G are connected if there is a path in a graph of length no greater than .
To calculate graphs
and
G, we can apply Delaunay triangulation because the Delaunay triangulation of a discrete point set corresponds to the dual graph of the Voronoi diagram [
37]. The proposed algorithm for calculating
and
G is presented in Algorithm 1.
| Algorithm 1 Calculate Voronoi neighborhood graph. |
| Require: P—a set of n points that represent the spatial position of measurements |
| (for example, the position of sensors at road crossings), |
| all points have distinct coordinates: |
| ( is an Euclidean distance function), |
| —maximal neighborhood size. |
| Begin |
| ▹ perform Delaunay tessellation, returns data structure T |
| ▹ which for each holds information |
| ▹ about each that has a common edge |
| ▹ initialize adjacency matrix of size with zeros |
| ▹ for the graph that will be generated for each of the n points in P |
| procedure Calculate_A() ▹ Calculate the adjacency matrix |
| ▹ where i—initial point index, |
| ▹k—neighbor point index, |
| ▹T—Delaunay tessellation structure, |
| ▹A—adjacency matrix, |
| ▹d—actual neighborhood distance. |
| for do |
| if ( or ) and then |
| |
| end if |
| if then |
| Calculate_A |
| end if |
| end for |
| end procedure |
| for do ▹ Fill adjacency matrix for each |
| Calculate_A |
| end for |
| return A |
Figure 1 illustrates the process of calculating the Voronoi neighborhood. In this image, a Voronoi diagram has been generated from a set of points. To calculate the neighborhood of a point indicated in red, we evaluate all cells around it with increasing diameter. The neighborhood level between the red and blue points is color-coded. We repeat this procedure for all points to compute the adjacency matrix of
G.
The next step in our approach is to rescale the adjacency matrix A generated by Algorithm 1 so that the farther the path between the nodes in graph is, the smaller the values in the rescaled adjacency matrix . In other words, the weights of the edges in should be inversely proportional to the path length between nodes in . To achieve this, we can apply one of several possible approaches:
2.4. Voronoi Neighborhood in Graph Neural Network: SAGE-Voronoi
After applying Algorithm 1 and one of the approaches (
7)–(
9) we can use the adjacency matrix
in SAGE layer (
1). In order to do so, we modify the
operator in (
1) so that it takes edge weight into account:
The graph convolutional neural network proposed in this paper is composed of a SAGE layer (
1) with the
operator (
10) for graph embedding, followed by an LSTM layer for temporal modeling. The final, third layer is the fully connected layer that calculates the network response by performing a linear combination of the LSTM outputs. We will refer to this network later in this paper as SAGE-Voronoi. The loss function used for training is a mean squared error (MSE):
A “Basic” SAGE network can also be used to perform an ablation study on the influence of the parameter
in the SAGE-Voronoi network. In practice, when we replace
(
7)–(
9) in (
10) by
(
3) (
is a binary adjacency matrix), the SAGE layer works the same as a SAGE-Voronoi layer. Parameter
plays a role in Algorithm 1 similar to that of
in (
3); however, when using the scaling functions (
7)–(
8), we can produce a non-binary adjacency matrix. Replacing (
3) with Voronoi neighborhood graph scaled with binary thresholding (
9) in (
10) changes the spatio-temporal graph problem from a pure distance-based approach to a neighborhood-based approach.
2.5. Dataset
To evaluate the usability of the network we propose, we used a city-scale car traffic dataset from Darmstadt city, which is available to download from
https://opendata.darmstadt.de/search/tags/Transport%20und%20Verkehr-24 (accessed on 4 November 2025). Darmstadt is an example of a smart city whose data serve as a reference for various studies in statistics and machine learning [
1,
38,
39,
40,
41].
The data is updated every minute and is provided in CSV format. The collection contains traffic volume values for individual intersections with known coordinates. The connections (roads) between intersections are not present in the dataset. We will treat this set of intersection coordinates as points P in Definition 2 and in Algorithm 1.
To download the data, we used the script available at
https://github.com/browarsoftware/darmstadt_download (accessed on 4 November 2025). We have utilized data from 35 days from 1 March 2024, resampled to 10-min intervals. During this period, 104 crossings with active sensors were included. We added up the measurements from all the sensors at each crossing, so we did not account for the direction of car traffic. As a result, the adjacency matrix
is symmetric. If during the 35 days any sensor did not provide traffic data, we replaced the readings with zeros (we did not apply any procedure to fill in missing data). Our dataset does not contain information on why the car traffic values are missing. There may be two reasons for this: the closure of the road infrastructure and the measurement sensor being turned off, or a malfunction of the measurement sensor. Since we do not know the reasons, excluding this type of data would distort the evaluation of our method, as we assume that we want to test its performance on real-world data. What is more, a common approach to missing data is to approximate it. In practice, this approximation is based on analogous solutions used for prediction [
42,
43], such as graph neural networks, which are similar to the predictors we evaluate in this paper. We therefore decided that estimating missing data is outside the scope of this paper. It does not affect the evaluation of the proposed method, as the experimental setup for all tested methods is identical. We have split the dataset into the train, validation, and test subsets in proportions
,
, and
, respectively, starting from the earliest to the latest time periods. The train and validation data were used during training. Evaluation of method performance was made on the test dataset. Each subset was randomly shuffled using a fixed seed. We repeated our experiments 10 times, changing the seed each time. Each of the three datasets was standardized by removing the mean and scaling to unit variance.
Figure 2 shows the locations of car crossings on a map of the city of Darmstadt. Red crossings are those discussed in detail in
Section 3 and
Section 4.
3. Results
We have implemented our method using the Python 3.8 programming language and the machine learning libraries TensorFlow 2.8 [
45], Keras 2.8 [
46], and SciPy 1.8 [
47], which include the Delaunay tessellation. Our implementation of the original SAGE GNN, which we significantly extended, is based on the source code
https://keras.io/examples/timeseries/timeseries_traffic_forecasting/ (accessed on 4 November 2025). The source code and dataset for our experiments can be downloaded from
https://github.com/bielprze/SAGE-Voronoi (accessed on 4 November 2025), and the experiments are fully reproducible.
To evaluate the proposed SAGE-Voronoi method, we have used the dataset described in
Section 2.5. We have considered three short-term forecast horizons: 1-sample horizon (10 min ahead), 2-sample horizons (20 min ahead), and 3-sample horizons (30 min ahead). We have tested three adjacency scalers, as defined by Equations (
7)–(
9). The results of the SAGE-Voronoi network were compared with those of the original SAGE approach and the simple Pure LSTM approach. Both the SAGE and SAGE-Voronoi used 64 LSTM units, with the length
W in (
1) set to 10. The networks were trained using the RMSprop optimizer [
48] with a learning rate of
for 40 epochs. The maximum neighborhood size
in Algorithm 1 was set to 5. We have used mean as a pooling function in permutation invariant pooling for both SAGE and SAGE-Voronoi.
A Pure LSTM network consists of an LSTM layer with 200 units, a connected dense layer with 200 units with ReLu activation, and a final dense layer with a size equivalent to the forecast horizon. The network was trained using the Adam optimizer [
49] with a learning rate of
for 200 epochs. The loss function was mean squared error (MSE). We used the meta-parameters for SAGE-family networks proposed by the creators of the original SAGE implementations.
We have also compared the performance of the proposed solution with two state-of-the-art deep learning approaches to car traffic prediction, namely Diffusion Convolutional Gated Recurrent Unit (DCGRU) [
50] and Spatio-Temporal Graph Convolutional Neural Network (STGCN) [
36]. For DCGRU, as suggested in the original implementation, we use the MSE loss, the Adam optimizer with a learning rate of
, 30 epochs of training, 20 DCGRU units, and a Diffusion Convolution parameter
. We have utilized the network implementation available at
https://github.com/mensif/DCGRU_Tensorflow2 (accessed on 22 November 2025). For STGCN, as suggested in the original implementation, we use the MSE loss, the Root Mean Square Propagation (RMSprop) optimizer with a learning rate of
, 40 epochs of training, and a channel size of
in the ST-convolution block. We have utilized the network implementation available at
https://github.com/Swadesh13/STGCN-Tf2 (accessed on 22 November 2025). Each network was trained for a specified number of epochs until no further loss decrease was observed. When there was no decrease in loss, we stopped training the network.
We have evaluated four error functions—MSE (
11), root mean squared error (RMSE)
mean absolute error (MAE)
and mean relative error (MRE)
Results for the Pure LSTM network are presented in
Table 1, for the DCGRU network in
Table 2, for STGCN in
Table 3, for the SAGE network in
Table 4, and for SAGE-Voronoi in
Table 5. All results were averaged over 10 repetitions with different random seeds (see
Section 2.5); hence, the table headers report “Mean MSE,” “Mean RMSE,” “Mean MAE,” and “Mean MRE” plus-minus standard deviation. In
Table 6,
Table 7 and
Table 8, we present calculations of confidence intervals between means with a level of confidence equal to 0.975 for 1-sample prediction, 2-sample prediction, and 3-sample prediction, respectively, and the mean RMSE is considered. A confidence interval equal to 0 indicates that the difference is not statistically significant, suggesting there may be no real difference between the means. Our experiment aimed to check which network is best at predicting car traffic in three short-term time horizons (1, 2, and 3 samples ahead). In the evaluation, we did not consider individual streets separately but calculated the aggregate average of metrics (
11)–(
14) for the entire city. In practice, analyzing the network’s performance on individual streets would be impractical for a city the size of Darmstadt and would obscure the overall effectiveness of the tested methods. However, to better visualize the effectiveness of individual networks at the street-level performance in
Figure 3 and
Figure 4, we present detailed traffic forecast values for three selected crossings that are representative of our dataset. These crossings are A003, A017, and A019.
Figure 5 presents evaluation results of all tested neural networks (means with standard deviations are marked as error bars) for 1-sample prediction, and the mean RMSE is considered.
In
Table 9 we present a comparison of the mean runtime (training + inference) for all tested approaches (seconds per epoch) and the prediction time (on the whole test dataset). All methods were evaluated on a GPU, except for STGCN predictions, which, due to insufficient GPU RAM, were evaluated on a CPU (on the tested hardware setup, the GPU accelerates calculations approximately 4 times compared to the CPU).
4. Discussion
As shown in
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Figure 5, all graph neural networks outperform the Pure LSTM architecture. Providing node topology information clearly improves the predictive capabilities of the DCGRU, STGCN, SAGE, and SAGE-Voronoi architectures. A non-linear fully connected layer in the Pure LSTM approach is insufficient to deduce this information from the training dataset. The Mean MRE prediction of Pure LSTM never dropped below
while the Mean RMSE was around 0.9 and the Mean MAE around 0.4.
DCGRU performs slightly better than LSTM; however, it obtains worse results than other graph networks. According to
Table 6,
Table 7 and
Table 8, the difference in performance measured by RMSE is statistically significant for all considered sample ranges. STGCN performs better than LSTM and DCGRU, but has higher variance than SAGE-family networks. In the case when sample size equals 1 (10 min), there is no significant difference between STGCN and SAGE-family networks (see
Table 6). In the case of 2 samples (20 min) and 3 samples (30 min), the predictions (see
Table 7 and
Table 8) show that STGCN performs significantly worse than SAGE-family networks (the values in the corresponding columns are above zero). The RMSEs in
Table 3 are 79.8 and 85.8, respectively, while for the SAGE in
Table 4, they are 7374 and 8202. We can observe a significant difference between SAGE and SAGE-Voronoi for scaling (
7) and (
8) for a 1-sample prediction in
Table 6, (
8) for 2- and 3-sample predictions in
Table 7 and
Table 8. In all those cases, SAGE-Voronoi performs better than a basic SAGE network.
It is also worth noting that there is a positive correlation between the values of MSE, RMSE, MAE, and MRE. This means that an increase or decrease in one of these metrics is reflected in an increase or decrease in the others (this is obvious for MSE and RMSE). The MRE metric is especially important because it shows the error rate as a percentage of the actual value. In the Darmstadt dataset, traffic varies from dozens to thousands of cars per sample, so a relative measure is more appropriate for assessing prediction quality. Both networks yielded very similar results for SAGE and SAGE-Voronoi. For forecast horizons of 10 min (1 sample) and 30 min (3 samples), the results of SAGE-Voronoi were better across all considered adjacency scaling methods. For one and three samples, prediction Mean MRE in SAGE-Voronoi dropped by
compared to the SAGE approach while using linear scaling (
7). For the 20 min prediction (2 samples), SAGE-Voronoi has slightly worse results in the case of exponential scaling (
8) for all error metrics. In comparison, the other two adjacency scalings resulted in better performance across all metrics, except Mean MRE, which has identical values for (
9). Therefore, we conclude that, in most cases, applying the Voronoi neighborhood graph improvement proposed in this paper has a positive effect on the SAGE graph neural network’s prediction performance.
Figure 3 and
Figure 4 present a more detailed visualization of the networks’ performance on three various crossings. These three crossings were selected because they exhibit significantly different scales of average movement per unit of time. As shown, both SAGE and SAGE-Voronoi perform very similarly; however, the quantitative error measures clearly demonstrate the advantage of the SAGE-Voronoi approach. The Pure LSTM approach is visibly and quantitatively inferior to the other methods. An interesting phenomenon also occurs at the A017 crossing, where there is heavy traffic from April 1 to April 2 (see
Figure 4a and the enlarged fragment in in
Figure 4b). The SAGE-Voronoi was able to predict the excessive traffic more accurately.
As shown in
Table 9, using the Darmstadt dataset as the benchmark, the slowest of the considered algorithms is STGCN. One epoch of training lasts about 10.1 s. This is nearly twice as slow as one training epoch of DCGRU, over three times as slow as LSTM, and four times as slow as SAGE and Sage-Voronoi. The SAGE network is the fastest; however, it is only 8% faster than SAGE-Voronoi. In terms of prediction performance on the entire test dataset, all considered networks are in the same order; however, DCGRU and LSTM operate at nearly identical speeds. In the case of STGCN, prediction is measured on the CPU rather than the GPU due to insufficient GPU RAM (the hardware we used in the experiment on the GPU typically provides about four times acceleration compared to the CPU). We can conclude that the SAGE network is the fastest, while SAGE-Voronoi is the second-fastest. However, we must acknowledge that this comparison might be unreliable—although we used implementations of all methods from their original papers, we cannot guarantee that those implementations are optimal.
Summarizing the proposed SAGE-Voronoi graph neural network allows reliable prediction of varied car traffic among network nodes. It also better fits the non-typical data in our dataset, demonstrating superior generalization performance compared to the basic SAGE network.