Spatio-Temporal Recursive Method for Traffic Flow Interpolation

Gang Wang; Yuhao Mao; Xu Liu; Haohan Liang; Keqiang Li

doi:10.3390/sym17091577

,

and

¹

Highway Monitoring and Emergency Response Center, Ministry of Transport of the P.R.C., Beijing 100029, China

²

School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China

³

CCSE Lab, Beihang University, Beijing 100083, China

⁴

School of Economics and Management, Beihang University, Beijing 100083, China

Symmetry2025, 17(9), 1577;https://doi.org/10.3390/sym17091577

This article belongs to the Section Computer

Version Notes

Order Reprints

Abstract

Traffic data sequence imputation plays a crucial role in maintaining the integrity and reliability of transportation analytics and decision-making systems. With the proliferation of sensor technologies and IoT devices, traffic data often contain missing values due to sensor failures, communication issues, or data processing errors. It is necessary to effectively interpolate these missing parts to ensure the correctness of downstream work. Compared with other data, the monitoring data of traffic flow shows significant temporal and spatial correlations. However, most methods have not fully integrated the correlations of these types. In this work, we introduce the Temporal–Spatial Fusion Neural Network (TSFNN), a framework designed to address missing data recovery in transportation monitoring by jointly modeling spatial and temporal patterns. The architecture incorporates a temporal component, implemented with a Recurrent Neural Network (RNN), to learn sequential dependencies, alongside a spatial component, implemented with a Multilayer Perceptron (MLP), to learn spatial correlations. For performance validation, the model was benchmarked against several established methods. Using real-world datasets with varying missing-data ratios, TSFNN consistently delivered more accurate interpolations than all baseline approaches, highlighting the advantage of combining temporal and spatial learning within a single framework.

Keywords:

monitoring; traffic; machine learning; interpolation; missing data

1. Introduction

In the realm of modern transportation systems, the significance of accurate and comprehensive traffic flow data cannot be overstated. The availability of reliable traffic information serves as the bedrock for informed decision making, operational efficiency, and, most critically, ensuring the safety of commuters and travelers. However, the integrity and completeness of traffic data can often be compromised due to various factors such as sensor failures, network disruptions, or incomplete coverage. In such scenarios, the process of traffic data interpolation emerges as a vital tool. Through advanced computational techniques, interpolation fills in gaps in datasets, providing a cohesive and continuous flow of information crucial for optimizing traffic management strategies and enhancing overall road safety. This article focuses on researching how to more efficiently and stably interpolate traffic monitoring data.

When discussing monitoring data interpolation, typically two primary methods are employed: static interpolation and dynamic interpolation. Static approaches overlook the inherent temporal dynamics present in time series, instead processing monitoring records as uniform sequences without explicitly modeling their sequential dependencies.

Traditional static interpolation approaches often rely on simple statistical or pattern-based strategies, such as last observation carried forward, or replacing missing entries with the mean, median, or values derived from recurring patterns [1]. Other widely used techniques include linear [2], polynomial [3], and spline interpolation [4], which are straightforward to implement for traffic monitoring datasets. However, these methods inherently assume that the data remain stable over time, limiting their applicability when faced with highly dynamic traffic conditions. In related work, Ref. [5] applied matrix decomposition and reconstruction to recover missing entries in monitoring datasets that resemble tunnel measurements, and similar strategies have been explored in subsequent studies [6,7,8].

Static interpolation approaches fail to account for the dynamic nature of traffic monitoring data, making them incapable of accurately reflecting the evolving patterns underlying data fluctuations. Given the pronounced variability that often characterizes such datasets, these methods have been shown to fall short of delivering the level of interpolation accuracy required under these conditions.

Dynamic interpolation explicitly targets the time-varying characteristics of monitoring data, encompassing both spatial and temporal dimensions. It is therefore commonly categorized into spatial and temporal interpolation. In the temporal domain, the emphasis lies in extracting and leveraging the intrinsic temporal dependencies present in monitoring sequences. Classical approaches such as the Autoregressive Moving Average (ARMA) and the Autoregressive Integrated Moving Average (ARIMA) models [9] operate under the assumption of linear trends in the data. This assumption, however, restricts their effectiveness when confronted with monitoring datasets exhibiting nonlinear temporal patterns. In the spatial domain, a range of statistical and machine learning techniques have been applied, including random forests [10,11], support vector machines (SVM) [12], and multiple imputation approaches [11,13]. The K-nearest neighbor (KNN) algorithm [11] estimates missing entries by measuring the similarity of neighboring samples within the dataset’s spatial context. Nonetheless, many of these methods perform only limited exploration of spatial feature representations. Since then, Recurrent Neural Networks (RNNs) [14,15,16] have been considered more suitable for time interpolation tasks because they can capture the inherent complex time dependencies in data. In addition, some neural network architectures, designed with time information in mind, have a simple structure [17,18], and due to its limited structural complexity, its therapeutic effect is often not ideal. The existing methods do not pay enough attention to spatial dimension features. In summary, methods in the field of traffic monitoring data interpolation are usually divided into two categories, static-based methods and dynamic methods, and the modeling ability for spatio-temporal correlations is limited.

Traffic monitoring data, as typical spatio-temporal data, have obvious spatial and temporal dependence characteristics. As mentioned earlier, existing research either overlooks these dual features or selectively focuses on one aspect while ignoring another. In addition to the practical challenges posed by missing data, traffic flow datasets also exhibit intrinsic symmetry characteristics in both temporal and spatial dimensions. Temporally, traffic patterns typically exhibit symmetry that follows a certain duration, such as diurnal or weekly cycles, while spatially, the influence relationship between different sensors is mutual. In this study, we explicitly leverage these symmetric properties by designing a spatio-temporal framework that jointly models temporal dependencies via bidirectional recurrent networks and spatial dependencies via symmetric sensor relationships. In this study, we introduce the Temporal–Spatial Fusion Neural Network (TSFNN), an architecture that augments conventional RNN-based frameworks by explicitly incorporating spatial relationship modeling. We used Bi-LSTM to model temporal dependencies and MLP to model spatial dependencies. Afterward, we conducted data imputation experiments using real-world datasets affected by missing data, aiming to determine the effectiveness of tailored methods for such situations. The following are the contributions of this study:

A spatio-temporal model for interpolating traffic monitoring data. The TSFNN architecture incorporates dedicated temporal and spatial components, enabling the extraction of spatio-temporal features and their subsequent integration within a unified framework.
A spatial module based on Multilayer Perception. TSFNN uses an MLP with self-masking parameters for spatial information mining.
High interpolation accuracy has been demonstrated on real traffic monitoring datasets with different missing rates.

2. Related Work

The imputation of missing traffic data has been a long-standing problem, and numerous solutions have been proposed based on machine learning, statistical modeling, and more recently, deep learning techniques. In this section, we review representative work in two categories: traditional machine learning and statistical methods and deep learning-based approaches.

2.1. Traditional Machine Learning and Statistical Methods

Early-stage methods are primarily static, which treat the traffic data as regular sequences and fill missing values based on local observations. Common techniques include the last observation carried forward, as well as mean, median, and pattern-based imputation [1]. Analytical approaches such as linear interpolation [2], polynomial interpolation [3], and spline interpolation [4] also fall into this category. More advanced statistical methods leverage low-rank matrix completion and reconstruction [5,6,7,8] to approximate missing values by uncovering global latent structures. These approaches, while effective in stable environments, often fail under dynamic traffic conditions due to their assumption of data stationarity.

To capture real-world variability, dynamic interpolation methods consider the temporal or spatial context of missing values. Spatial modeling techniques include random forests [10,11], support vector machines [12], and K-Nearest Neighbors (KNN) [11], which use feature similarity to estimate missing entries. Multiple imputation has also been explored in spatial settings [13]. Temporal methods, such as ARIMA models [9], rely on historical trends but are limited by linearity assumptions. Hybrid statistical methods [19] include Bayesian Maximum Entropy (BME) models, which estimate values under uncertainty by leveraging spatial and temporal distributions, especially in sensor-based IoT environments. Least Squares SVM (LS-SVM) [20] has also been adapted for predictive modeling directly with missing inputs, bypassing explicit imputation. Additionally, some works [21] explore deploying KNN and missForest on embedded devices like Raspberry Pi, enabling in situ data imputation for edge-based traffic monitoring applications.

2.2. Deep Learning-Based Methods

Deep learning has emerged as a powerful alternative for traffic data imputation, making it particularly effective in capturing complex temporal and spatial correlations. Recurrent Neural Networks (RNNs) and their variants, particularly LSTM-based models, have been widely applied to interpolate missing sequences by modeling long-term dependencies [14,15,16]. While effective, their performance may be constrained by limited structural complexity, as seen in early designs [17,18]. More recently, TIDER [22] has been proposed to decouple time series into trend, periodicity, and residual, improving imputation fidelity by separating dynamic components. To fully utilize both temporal and spatial information, researchers have proposed joint spatio-temporal models: GRIN [23] introduces a bidirectional graph recurrent architecture, in which forward and backward graph RNNs extract direction-aware features, followed by a refinement module for final imputation. TimesNet [24] transforms one-dimensional sequences into two-dimensional structures and models inter-cycle and intra-cycle variations using modular components. DLinear [25] simplifies imputation by decomposing sequences into trend and seasonal parts, which are modeled through independent linear layers. These models significantly outperform traditional techniques, especially under high missing rates, due to their ability to learn deep representations of spatio-temporal correlations. With the advancement of edge computing, deep learning models are also being adapted for low-resource settings, including on-device deployment for real-time imputation. This is particularly relevant for smart transportation systems, where data quality must be ensured even under communication or computation constraints.

3. Framework and Methodology

This section presents a formal definition of the traffic flow data imputation problem, encompassing scenarios with continuously missing segments as well as the requisite preliminary materials.

3.1. Problem Formalization and Preliminaries

Firstly, traffic flow data typically include readings collected from predefined sensor arrays at regular intervals during specific time periods.

Let T represent the total monitoring period and D the number of sensors. As shown in Figure 1, the collected data can be structured as a multivariate time series

X = x_{1}, x_{2}, \dots, x_{T}

. At each time step

H_{t}

, the associated sensor measurements are denoted as

x_{t} \in R^{D}

. Moreover,

x_{t}

can be expressed as

x_{t}^{1}, x_{t}^{2}, \dots, x_{t}^{D}

, where D indicates the dimensionality of the time series, which is equivalent to the number of sensors deployed.

Figure 1. Schematic illustration of the process for converting traffic flow data into a multivariate time series. With D sensors recording data over T time points, aligning the measurements by timestamp produces a set of multivariate sequences. Formally, if

S_{d} = {x_{1}^{d}, x_{2}^{d}, \dots, x_{T}^{d}}

denotes the monitoring sequence of the d-th sensor; the resulting multivariate time series can be expressed as

X = {[S_{1}^{T} S_{2}^{T} \dots S_{D}^{T}]}^{T}

.

In practice, unexpected factors such as sensor malfunctions frequently lead to incomplete traffic flow datasets, resulting in the loss of certain observations. Such missing intervals are often continuous in nature (e.g., as illustrated by the blue-highlighted region in Figure 2, which shows three consecutive missing time steps). To allow the model to process sequences containing missing entries, we assign

x_{t}^{d} = 0

whenever the measurement

x_{t}^{d}

is unavailable. However, this strategy risks conflating absent values with genuine zero readings. To address this, we introduce a masking vector

m_{t}

to explicitly indicate the presence or absence of each measurement in

x_{t}^{d}

. Denoting the d-th element of

m_{t}

as

m_{t}^{d}

, the vector is defined as

m_{t}^{d} = \{\begin{matrix} 0, & if x_{t}^{d} is missing, \\ 1, & otherwise . \end{matrix}

(1)

Write

M = [m_{1} m_{2} \dots m_{T}]

as a mask matrix.

Figure 2. Example of a multivariate time series with four dimensions and a sequence length of seven, containing several missing entries. The two sequences on the right correspond to the masking vector

m_{t}

and the time interval vector

γ_{t}

. The computation of

γ_{t}

is based on the observation times of

x_{1}

to

x_{7}

, which are given by

{H_{1}, H_{2}, \dots, H_{7}} = {0, 3, 5, 6, 9, 11, 14}

. The lower panel illustrates the derivation of the interval vector for the second dimension.

We introduce

γ_{t}

to denote the elapsed time since the most recent valid observation for a given feature. This variable serves to capture historical patterns of data absence, which can provide valuable contextual information for the downstream model. Let

H_{t}

denote the timestamp at step t and

m_{t}^{d}

the missing indicator of the d-th sensor at time t. The value of

γ_{t}

can then be formulated as

\begin{matrix} γ_{t}^{d} = \{\begin{matrix} H_{t} - H_{t - 1} + γ_{t - 1}^{d} & i f t > 1, m_{t - 1}^{d} = 0 \\ H_{t} - H_{t - 1} & i f t > 1, m_{t - 1}^{d} = 1 \\ 0 & i f t = 1 \end{matrix} \end{matrix}

(2)

where

γ_{t}^{d}

is used to represent the d-th element in

γ_{t}

. Write

γ = [γ_{1} γ_{2} \dots γ_{T}]

as the attenuation matrix.

Consider the case where, at timestamp t, the reading from sensor d is unavailable, i.e.,

x_{t}^{d}

is unobserved. The goal is to estimate

z_{t}^{d}

as an approximation to the true value. In most existing approaches for traffic flow data imputation,

z_{t}^{d}

is inferred using other measurements from the same sensor

x_{t^{'}}^{d}

, where

t^{'} \neq t

. Such strategies rely solely on the temporal continuity of the individual monitoring sequence, neglecting spatial information from other sensors. Conversely, some methods draw upon readings

x_{t}^{d^{'}}

from different sensors (

d^{'} \neq d

) at the same timestamp, yet they disregard temporal dependencies from other time steps. As noted earlier, traffic flow data exhibit pronounced spatio-temporal dependencies—both within a single stream (intra-flow) and across different streams (cross-flow). Existing interpolation techniques struggle to simultaneously account for both aspects. To address this limitation, we propose TSFNN, a framework that generates an estimate

{\tilde{x}}_{t}^{d}

by jointly leveraging intra-stream and inter-stream correlations, thereby capturing both temporal and spatial dependencies in a unified manner.

The aim of the TSFNN framework is to define a mapping function

F

capable of accurately interpolating missing entries such that the predicted values are consistent with the underlying distribution of the original dataset. This objective can be formulated as minimizing the interpolation error. In this work, the absolute error is adopted as the loss metric. Let

x_{t}^{d}

denote the ground-truth value for sensor d at time t, which is unobserved in the dataset, and let

{\tilde{x}}_{t}^{d} = F (X)

represent the corresponding estimate derived from the available observations. The absolute loss is then expressed as

L (x_{t}^{d}, {\tilde{x}}_{t}^{d}) = |x_{t}^{d} - {\tilde{x}}_{t}^{d}|

. The interpolation task can thus be reframed as finding a function

F

that satisfies

\min_{F} A V E [\sum_{t = 1}^{T} \sum_{d = 1}^{D} (1 - m_{t}^{d}) L (x_{t}^{d}, {\tilde{x}}_{t}^{d})] = \min_{F} A V E [\sum_{t = 1}^{T} \sum_{d = 1}^{D} (1 - m_{t}^{d}) | F (X) - x_{t}^{d} |]

(3)

where

A V E

represents the averaging operator.

3.2. Framework of TSFNN

As shown in Figure 3, the TSFNN framework is composed of two main interpolation components: the Temporal Module and the Spatial Module. The temporal module aims to extract temporal patterns from the data streams of individual sensors, thereby modeling their dynamics over time. In contrast, the spatial module is responsible for capturing inter-sensor dependencies—especially among sensors operating in similar environments—and for refining the predictions produced by the temporal module. The detailed architecture of the proposed framework is presented in Figure 4.

Figure 3. Framework of TSFNN. White circles denote missing values, black lines indicate the connections between observed and missing values within each layer, and blue lines depict the links between interpolation results.

Figure 4. Architecture of TSFNN. The diagram on the left depicts the component responsible for capturing temporal features, while the diagram on the right illustrates the component dedicated to capturing spatial features.

In the proposed framework, the temporal module is implemented using a Recurrent Neural Network (RNN)-based architecture, while the spatial module is constructed with an enhanced Multilayer Perceptron (MLP). Next, we introduce the temporal module in detail in Section 3.2.1, spatial module in Section 3.2.2, the combination method in Section 3.2.3, and the error/loss in Section 3.2.4.

3.2.1. Temporal Module

The temporal module is implemented through a function

F_{t}

operating within each data stream. In our framework,

F_{t}

is realized using a Long Short-Term Memory (LSTM) network. The structure of LSTM is shown in Figure 5.

Figure 5. Architecture of LSTM unit. In the figure,

h_{t}

represents hidden state,

x_{t}

indicates data,

c_{t}

stands for cell state,

σ

means sigmoid function,

t a n h

stands for tanh function, and ⊙ represents the element-wise multiplication.

The LSTM [26] is designed to model sequential data and capture long-range temporal dependencies. By incorporating gating mechanisms, it mitigates the vanishing gradient problem commonly observed in conventional RNNs. An LSTM unit contains a memory cell, denoted as

c_{t}

, along with three gates: the input gate

i_{t}

, the forget gate

f_{t}

, and the output gate

o_{t}

. The memory cell retains or discards information depending on the signals from these gates, while the gates regulate the flow of information through the sequence. This process can be described as

\begin{matrix} f_{t} & = σ (W_{f} x_{t} + V_{f} h_{t - 1} + b_{f}) \end{matrix}

(4)

\begin{matrix} i_{t} & = σ (W_{i} x_{t} + V_{i} h_{t - 1} + b_{i}) \end{matrix}

(5)

\begin{matrix} o_{t} & = σ (W_{o} x_{t} + V_{o} h_{t - 1} + b_{o}) \end{matrix}

(6)

\begin{matrix} c_{t} & = f_{t} ⊙ c_{t - 1} + (i_{t} ⊙ \tanh (W_{c} x_{t} + V_{c} h_{t - 1} + b_{c})) \end{matrix}

(7)

\begin{matrix} h_{t} & = o_{t} ⊙ \tanh (c_{t}) \end{matrix}

(8)

where

σ

represents the sigmoid function,

t a n h

stands for the tanh function, and ⊙ indicates the element-wise multiplication. The trainable parameters of the model include the weight matrices

W

and

V

, as well as the bias vector

b

.

To comprehensively capture temporal dependencies in the data, a bidirectional LSTM (Bi-LSTM) [27] is employed in constructing the temporal module. In this configuration, the forward hidden state at time t receives input from step

t - 1

, while the backward hidden state receives input from step

t + 1

. This design ensures that

x_{t}^{d}

is not directly used when estimating

{\tilde{x}}_{t}^{d}

. The mathematical formulation of the Bi-LSTM model is given as follows:

\begin{matrix} {\tilde{x}}_{t} & = W_{x} [{\vec{h}}_{t - 1} \circ {\overset{\leftarrow}{h}}_{t + 1}] + b_{x} = {\vec{W}}_{x} {\vec{h}}_{t - 1} + {\overset{\leftarrow}{W}}_{x} {\overset{\leftarrow}{h}}_{t + 1} + b_{x} \end{matrix}

(9)

\begin{matrix} {\vec{ε}}_{t} & = \exp \{- \max (0, {\vec{W}}_{ε} {\vec{γ}}_{t} + {\vec{b}}_{ε})\} \end{matrix}

(10)

\begin{matrix} {\vec{h}}_{t} & = L S T M ({\vec{h}}_{t - 1} ⊙ {\vec{ε}}_{t}, x_{t} \circ m_{t}) \end{matrix}

(11)

\begin{matrix} {\overset{\leftarrow}{ε}}_{t} & = \exp \{- \max (0, {\overset{\leftarrow}{W}}_{ε} {\overset{\leftarrow}{γ}}_{t} + {\overset{\leftarrow}{b}}_{ε})\} \end{matrix}

(12)

\begin{matrix} {\overset{\leftarrow}{h}}_{t} & = L S T M ({\overset{\leftarrow}{h}}_{t + 1} ⊙ {\overset{\leftarrow}{ε}}_{t}, x_{t} \circ m_{t}) \end{matrix}

(13)

Here, the arrows indicate the forward and backward directions of information flow. The operator ⊙ denotes element-wise multiplication, while ∘ represents the concatenation operation. The variables

{\vec{h}}_{t}

and

{\overset{\leftarrow}{h}}_{t}

correspond to the hidden states from the preceding time step in the forward and backward passes, respectively. The parameters

W

,

V

, and

b

represent the trainable weight matrices and bias vectors.

Equation (9) functions as the regression component, mapping the forward hidden state

\vec{h} t - 1

and the backward hidden state

\overset{\leftarrow}{h} t + 1

to the predicted vector

{\hat{x}}_{t}

. In Equations (10) and (12), two temporal decay factors,

\vec{ε} t

and

\overset{\leftarrow}{ε} t

, are applied to attenuate

\vec{h} t - 1

and

\overset{\leftarrow}{h} t + 1

, respectively. Conceptually, as

{\vec{γ}}_{t}

or

{\overset{\leftarrow}{γ}}_{t}

increases—signifying a longer interval since the last observation—the associated decay factor

{\vec{ε}}_{t}

or

{\overset{\leftarrow}{ε}}_{t}

decreases, thereby intensifying the attenuation of the hidden states. In other words, as the current time step becomes more distant from the nearest observed value, the impact of

\vec{h} t - 1

or

\overset{\leftarrow}{h} t + 1

on estimating

x_{t}

diminishes. These decay factors represent the temporal missing patterns inherent in the sequence, which are essential for achieving accurate interpolation [14]. Equations (11) and (13) update the forward and backward hidden states, respectively, using the corresponding decayed hidden state from the prior step. At this stage, only intra-stream temporal relationships are captured, and the resulting estimates serve as intermediate outputs rather than the final interpolation results.

3.2.2. Spatial Module

The spatial module aims to model inter-flow dependencies, interpreted as spatial correlations between sensors. In real-world scenarios, sensors situated in similar environments frequently display analogous data patterns, which can appear as either positive or negative correlations. Therefore, estimation

z_{t}^{d}

can be influenced by

c_{t}^{d^{'}}, d^{'} \neq d

(

c_{t}

represents the monitoring data

x_{t}

which have been preliminarily imputed by

{\tilde{x}}_{t}

). Moreover, the strength of the mutual dependency between two sensors is generally inversely related to the physical distance separating them. In practice, spatial relationships are implicitly encoded within the multiple data streams originating from monitoring datasets. Such latent structures can be identified and learned through appropriate computational techniques.

The spatial module implements a cross-stream interpolation function U. We define

\begin{matrix} {\hat{x}}_{t}^{d} = U_{t} (c_{t} - c_{t}^{d}), \end{matrix}

(14)

where

c_{t}

denotes the data vector at timestamp

h_{t}

, which includes the in-stream estimates obtained at this step. This formulation ensures that the estimation

{\hat{x}}_{t}^{d}

leverages information from other data streams at the same time step, thereby eliminating the direct influence of its own prior estimation. The function U is realized using a Multilayer Perceptron (MLP). Let

{\hat{x}}_{t}

represent the spatial estimation vector, whose mathematical expression is given by

\begin{matrix} c_{t} & = m_{t} ⊙ x_{t} + (1 - m_{t}) ⊙ {\tilde{x}}_{t}, \end{matrix}

(15)

\begin{matrix} {\hat{x}}_{t} & = σ (W_{x} c_{t} + b_{x}), \end{matrix}

(16)

where

W_{x}

and

b_{x}

are parameters, and

σ

perform as sigmoid functions. The diagonal value of the parameter matrix

W_{x}

is set to 0 to ensure that the estimated value

{\hat{x}}_{t}^{d}

is not affected by

x_{t}^{d}

. Note that in this design, we do not explicitly use a spatial distance matrix or sensor adjacency graph. Instead, spatial relationships are implicitly learned through the MLP layer, which captures co-occurrence patterns across sensors. The diagonal masking of

W_{x}

ensures that the imputation for each sensor is influenced only by other sensors, allowing the model to learn asymmetric but structured dependencies directly from data.

3.2.3. Spatial–Temporal Fusion Module

Figure 4 illustrates the full architecture of the TSFNN model. The two intermediate estimates generated within the framework have distinct emphases: the left one focuses on temporal dependencies within each individual flow, whereas the right one models spatial correlations among different flows. To merge these complementary outputs, we employ a weighting factor

α_{t}

that fuses the temporal estimate

{\tilde{x}}_{t}

with the spatial estimate

{\hat{x}}_{t}

, guided by the temporal gap indicators

{\overset{\leftarrow}{γ}}_{t}

and

{\vec{γ}}_{t}

. Let

z_{t}

represent the final result of the interpolation process, which is expressed as

\begin{matrix} α_{t} & = σ (W_{α} [{\vec{γ}}_{t} \circ {\overset{\leftarrow}{γ}}_{t} \circ m_{t}] + b_{α}), \end{matrix}

(17)

\begin{matrix} z_{t} & = α_{t} ⊙ {\tilde{x}}_{t} + (1 - α_{t}) ⊙ {\hat{x}}_{t} . \end{matrix}

(18)

Here, the weight vector satisfies

α_{t} \in {[0, 1]}^{D}

. In Equation (16), the spatial estimate

{\hat{x}}_{t}

is derived from

c_{t}

, where each element of

c_{t}

may correspond either to an actual observation or to a temporally interpolated value. To adaptively learn the weighting coefficients, we incorporate the time interval vectors

{\overset{\leftarrow}{γ}}_{t}

and

{\vec{γ}}_{t}

together with the masking vector

m_{t}

, as formulated in Equation (17).

3.2.4. Loss Function

Our goal is to reduce estimation errors while preserving a distribution of interpolated data that aligns closely with that of the original dataset. Since all missing entries are masked during training, the true distribution of unobserved values is inaccessible. To address this, we promote distributional consistency by matching the estimated values within observed segments to the corresponding actual data points, thereby approximating the overall distribution of the complete dataset. Thus, we use the observed values to calculate the error, that is, the error between the estimated and the observed values. The absolute error

|z_{t}^{d} - x_{t}^{d}|

is used as the loss for the estimate above, and the mean absolute error (MAE) is employed as the total loss for the entire dataset. This loss can be defined as

L (z, x) = \frac{\sum_{t = 1}^{T} \sum_{d = 1}^{D} m_{t}^{d} \times |z_{t}^{d} - x_{t}^{d}|}{\sum_{t = 1}^{T} \sum_{d = 1}^{D} m_{t}^{d}}

The model is updated by minimizing the loss

L (z, x)

.

3.3. Evaluation Metrics

Interpolation accuracy was assessed using the mean absolute error (MAE) and root mean square error (RMSE), two standard metrics for regression tasks. Regression models estimate mathematical relationships by analyzing associations between features and targets. The MAE reflects the average absolute deviation between predictions and ground truth in the same units as the data, making it straightforward to interpret. The RMSE has a similar formulation but penalizes larger deviations more heavily due to the squaring operation.

Let

{\hat{x}}_{t}^{d}

be the estimated value for a missing entry in dimension d at time t,

x_{t}^{d}

the corresponding ground truth, and

m_{t}^{d}

the mask indicating whether the entry is included in evaluation. The MAE and RMSE are computed as

\begin{matrix} M A E & = \frac{\sum_{t = 1}^{T} \sum_{d = 1}^{D} m_{t}^{d} \times |{\hat{x}}_{t}^{d} - x_{t}^{d}|}{\sum_{t = 1}^{H} \sum_{d = 1}^{D} m_{t}^{d}} \end{matrix}

(19)

\begin{matrix} R M S E & = \sqrt{\frac{\sum_{t = 1}^{T} \sum_{d = 1}^{D} m_{t}^{d} \times {({\hat{x}}_{t}^{d} - x_{t}^{d})}^{2}}{\sum_{t = 1}^{T} \sum_{d = 1}^{D} m_{t}^{d}}} \end{matrix}

(20)

4. Experiment

4.1. Datasets

We evaluated our model’s performance on four widely used real-world traffic datasets, PeMS03, PeMS04, PeMS07, and PeMS08, all sourced from the Caltrans Performance Measurement System (https://pems.dot.ca.gov/ (accessed on 12 February 2024)). Each dataset records the volume of vehicles passing through each sensor at 5-min intervals, resulting in 288 time steps per day per sensor.

4.1.1. Dataset Missing Value Manufacturing

To assess TSFNN’s ability to interpolate multivariate time series under various defect rates, we introduced controlled missingness into the dataset. Specifically, continuous temporal gaps were created to mimic consecutive missing segments, while additional values were removed following a Missing Completely at Random (MCAR) scheme. This process produced a hybrid missing pattern that combines sequential and random deletions, enabling a thorough evaluation of the model’s performance.

4.1.2. Data Normalization

To ensure fair comparison and minimize the influence of scale differences among features, a standard normalization step was applied in preprocessing. In particular, zero-mean normalization [28] was used, with each feature standardized based on its mean and variance computed from the original dataset.

4.2. Experimental Settings

4.2.1. Reproducibility

The dimension of the hidden state was fixed to 64. The temporal module, spatial module, and the LSTM and MLP layers contained within them all only had a single layer. We adopted ReLU as the activation function in the MLP components and tanh/sigmoid (inherent to LSTM) for the temporal module. The model was trained using the Adam optimizer with an initial learning rate of

0.001

, which was adaptively adjusted during training. A learning rate scheduler with a patience of 10 epochs and a decay factor of

0.1

was employed. The training used a batch size of 64 for up to 1000 epochs, with the sampling window size n fixed at 10.

Our model contains a total of 313,401 trainable parameters, corresponding to a memory size of approximately 1.20 MB under 32-bit floating point precision. On average, each training epoch took 7 s, and the full inference process required around 10 s on the CPU i7-12700H (Intel, Santa Clara, CA, USA), with a frequency of 2.30 GHz and GPU RTX 3090 (NVIDIA, Santa Clara, CA, USA).

4.2.2. Baselines

We selected ten widely used baselines to evaluate the performance of TSFNN. The traditional machine learning and statistical methods included the following:

(1): Mean [1]: This method replaces missing values (with 0 for observations that are missing) by averaging the preceding and succeeding observations.
(2): This approach identifies the k-nearest neighbors of a given sample and computes their mean value to perform interpolation. For the dataset used in this study, the best performance was obtained with $k = 2$ .
(3): MissRandomForest (MRF) [10]: This method is a widely adopted strategy for handling missing data, leveraging the random forest algorithm to predict and impute missing values iteratively.

The deep learning-based methods included the following:

(1): RNN [14]: This method utilizes an LSTM-based architecture to model temporal dependencies, with the specific aim of imputing missing values.
(2): Bi-RNN [27]: This method extends the RNN method to bidirectional and uses bidirectional LSTM for more complete learning of temporal information.
(3): TIDER [22]: A deep learning model for multivariate time series imputation, which enhances the imputation effect by decoupling time dynamic factors such as trend, periodicity, and residual.
(4): GRIN [23]: A bidirectional graph Recurrent Neural Network consisting of two unidirectional GRIN sub-modules that perform two-stage interpolation for each direction, thereby processing the input sequence progressively in both forward and backward directions over time.
(5): TimesNet [24]: This model employs a modular architecture to decompose complex temporal dynamics into multiple cycles and achieves unified modeling of both intra-cycle and inter-cycle variations by transforming the original one-dimensional time series into a two-dimensional representation.
(6): Transformer: Directly uses an attention mechanism for imputation.
(7): DLinear [25]: Decomposes the original sequence into two parts, trend and seasonality, typically through simple methods such as moving averages. It models the trend and seasonal components separately using independent linear layers and then combines the output results to obtain the final prediction value.

4.3. Main Result

As shown in Table 1 and Table 2, the MEAN method consistently produce the lowest accuracy across all tested missing rates. This limitation arises from its simplistic strategy of interpolating by averaging the values immediately before and after the gap, which performs poorly when faced with long sequences of missing data or pronounced nonlinear patterns. In contrast, the two traditional machine learning approaches, KNN and MRF, achieved markedly better results. KNN identified the nearest neighbors of a missing point based on the Euclidean distance and imput its value using their mean, enabling more relevant reference points than the MEAN method, thus yielding higher accuracy. The MRF, on the other hand, estimated missing entries by iteratively building decision trees, focusing on modeling the dataset’s overall trend rather than depending solely on local information, which enhanced its interpolation effectiveness. The RNN adopted a unidirectional architecture to learn temporal information, which can better capture temporal features than previous methods and thus deduce the overall trend of the data. However, due to the temporal nature of the sequence, the information learned from one-way learning often could not cover the complete information of the data well, so there are certain limitations in its interpolation. Bi-RNN adopted a bidirectional RNN architecture to learn the temporal information of data from both positive and negative directions, striving to comprehensively capture temporal relationships. When the missing rate is relatively low, RNN-based methods may not be as effective as the previous machine learning algorithms. When the missing rate increases, the advantages of RNN methods become reflected. TIDER and DLinear relied on explicit decomposition of temporal components (e.g., trend and seasonality), which helped under regular patterns but limited their flexibility in complex traffic data. GRIN modeled bidirectional temporal and graph structures but was sensitive to high missing ratios due to accumulated errors. TimesNet captured periodicity effectively but lacked spatial modeling. Transformer performed moderately but suffered from disrupted attention when missing rates are high. TSFNN exhibited the best performance at different missing rates due to its combination of spatial features. We selected four datasets with 70% missing values to illustrate, as depicted in Figure 6. As shown in the figure, the filling curve of our method closely aligns with the true values, demonstrating the effectiveness of our model. The performance of these methods deteriorates as the missing rate increases. This is partly due to the remaining data becoming increasingly sparse and the need to extract fewer feature information.

Table 1. Results of MAE (lower is better). The best result is indicated in bold.

Table 2. Results of RMSE (lower is better). The best result is indicated in bold.

Figure 6. The black crosses denote the observed values, the red circles indicate the ground-truth targets for imputation, and the blue curve illustrates the interpolation results.

4.4. Ablation Experiment

In contrast to standard RNN frameworks, TSFNN integrates a spatial interpolation module in addition to its temporal interpolation component. Furthermore, the temporal module adopts a bidirectional interpolation design, setting it apart from conventional RNN-based methods. To evaluate the role of each architectural element, we conducted ablation studies in which specific modules were removed or altered. The resulting model variants are outlined as follows:

TSFNN-t: This variant removes the temporal module from TSFNN, thereby relying solely on the spatial module to capture spatial dependencies without modeling temporal correlations.
TSFNN-s: This variant removes the spatial interpolation module from TSFNN, thereby relying solely on the temporal module to capture temporal dependencies without modeling spatial correlations.
TSFNN-bi: This variant modifies the temporal module by replacing the bidirectional structure with a unidirectional structure, thereby capturing temporal dependencies in only one direction.

Table 3 and Table 4 demonstrate that TSFNN outperformed all the variant models, and its superiority became increasingly evident on all datasets as the missing rate increased. The results of TSFNN-t demonstrate that using MLP alone to extract spatial correlation information for input missing data is clearly not sufficient to achieve good interpolation results. This emphasizes the basic requirement of capturing temporal correlations to effectively estimate traffic data. These results highlight the crucial role of temporal correlation modeling in accurately estimating traffic data. Substituting the temporal module’s bidirectional structure with a unidirectional one, as in TSFNN-b, led to a noticeable drop in interpolation accuracy. This decline stems from the reduced capacity to learn contextual temporal dependencies in both directions, weakening data reconstruction and, consequently, interpolation quality. Similarly, removing the spatial module in TSFNN-s resulted in poorer outcomes, underscoring that spatial dependency modeling is equally vital for effective interpolation. Taken together, the findings indicate that every component within TSFNN plays a significant part in achieving its overall performance.

Table 3. Ablation experimental results of MAE (lower is better). The best result is indicated in bold.

Table 4. Ablation experimental results of RMSE (lower is better). The best result is indicated in bold.

5. Conclusions

In this work, we adopted the TSFNN to investigate traffic flow datasets affected by various missing ratios and sequences of continuous data loss. Unlike conventional methods that often neglect either temporal or spatial dependencies, the proposed framework simultaneously leverages both to guide the imputation process. By capturing these spatio-temporal patterns, the TSFNN reconstructs incomplete data more precisely, improving overall dataset integrity and supporting reliable stability assessments.

The main contributions are as follows:

By utilizing time and space modules aimed at capturing temporal and spatial correlations, we effectively alleviate the traditional challenges outlined earlier. The temporal module integrates a bidirectional LSTM architecture, which helps to enhance the capture of temporal dependencies. Meanwhile, the spatial module utilizes MLP to proficiently capture spatial correlations.
The experimental results show that the TSFNN achieves improvements in the RMSE compared to benchmark approaches, including MEAN, KNN, MRF, RNN, and Bi-RNN. This demonstrates the framework’s advantage in interpolation accuracy.
From the ablation experiment, it can be seen that each module of the TSFNN has a unique role and is indispensable.

Although our experiments were conducted on freeway-based PeMS datasets, the modular design of TSFNN is not specific to highway scenarios. The separation of temporal and spatial modules allows it to adapt to other traffic environments, such as arterial roads and urban intersections, which also exhibit spatio-temporal patterns. Furthermore, the framework is potentially applicable to other domains involving spatio-temporal missing data, such as meteorological forecasting, environmental sensing, or healthcare monitoring. Future work will explore domain adaptation to broaden the TSFNN’s applicability.

Author Contributions

Methodology, G.W.; Formal analysis, X.L.; Investigation, H.L.; Writing—original draft, Y.M.; Writing—review & editing, K.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study received funding from the National Key R&D Program of China under grant 2022YFB2602103; the National Natural Science Foundation of China under Grant No. U2469205; the Fundamental Research Funds for the Central Universities of China under Grant No. JKF-20240769; the Beijing Nova Program under Grant No. 20230484353. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

All authors are affiliated with universities or public research institutions and have no commercial affiliations. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflicts of interest.

References

Kreindler, D.M.; Lumsden, C.J. The Effects of the Irregular Sample and Missing Data in Time Series Analysis. Nonlinear Dyn. Psychol. Life Sci. 2006, 10, 187–214. [Google Scholar]
Benesty, J.; Chen, J.; Huang, Y. Time-delay estimation via linear interpolation and cross correlation. IEEE Trans. Speech Audio Process. 2004, 12, 509–519. [Google Scholar] [CrossRef]
Gasca, M.; Sauer, T. Polynomial interpolation in several variables. Adv. Comput. Math. 2000, 12, 377–410. [Google Scholar] [CrossRef]
McKinley, S.; Levine, M. Cubic spline interpolation. Coll. Redwoods 1998, 45, 1049–1060. [Google Scholar]
Luo, X.; Meng, X.; Gan, W.; Chen, Y. Traffic data imputation algorithm based on improved low-rank matrix decomposition. J. Sens. 2019, 2019, 7092713. [Google Scholar] [CrossRef]
Mazumder, R.; Hastie, T.; Tibshirani, R. Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 2010, 11, 2287–2322. [Google Scholar]
Yu, H.-F.; Rao, N.; Dhillon, I.S. Temporal regularized matrix factorization for high-dimensional time series prediction. Adv. Neural Inf. Process. Syst. 2016, 29, 847–855. [Google Scholar]
Schnabel, T.; Swaminathan, A.; Singh, A.; Chandak, N.; Joachims, T. Recommendations as treatments: Debiasing learning and evaluation. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016. [Google Scholar]
Al-Douri, Y.K.; Hamodi, H.; Lundberg, J. Time series forecasting using a two-level multi-objective genetic algorithm: A case study of maintenance cost data for tunnel fans. Algorithms 2018, 11, 123. [Google Scholar] [CrossRef]
Stekhoven, D.J.; Bühlmann, P. MissForest—Non-parametric missing value imputation for mixed-type data. Bioinformatics 2012, 28, 112–118. [Google Scholar] [CrossRef]
Qian, C.; Chen, J.; Luo, Y.; Dai, L. Random forest based operational missing data imputation for highway tunnel. J. Transp. Syst. Eng. Inf. Technol. 2016, 16, 81. [Google Scholar]
Zhang, J.; Li, D.; Wang, Y. Predicting tunnel squeezing using a hybrid classifier ensemble with incomplete data. Bull. Eng. Geol. Environ. 2020, 79, 3245–3256. [Google Scholar] [CrossRef]
Kim, B.; Lee, D.-E.; Preethaa, K.R.S.; Hu, G.; Natarajan, Y.; Kwok, K.C.S. Predicting wind flow around buildings using deep learning. J. Wind. Eng. Ind. Aerodyn. 2021, 219, 104820. [Google Scholar] [CrossRef]
Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 2018, 8, 6085. [Google Scholar] [CrossRef]
Guo, D.; Li, J.; Li, X.; Li, Z.; Li, P.; Chen, Z. Advance prediction of collapse for TBM tunneling using deep learning method. Eng. Geol. 2022, 299, 106556. [Google Scholar] [CrossRef]
Adeyemi, O.; Grove, I.; Peets, S.; Domun, Y.; Norton, T. Dynamic neural network modelling of soil moisture content for predictive irrigation scheduling. Sensors 2018, 18, 3408. [Google Scholar] [CrossRef]
Liang, Y.; Jiang, K.; Gao, S.; Yin, Y. Prediction of tunnelling parameters for underwater shield tunnels, based on the GA-BPNN method. Sustainability 2022, 14, 13420. [Google Scholar] [CrossRef]
Wang, Y.; Pang, Y.; Song, X.; Sun, W. Tunneling Operational Data Imputation with Radial Basis Function Neural Network. In Proceedings of the International Joint Conference on Energy, Electrical and Power Engineering, Melbourne, VIC, Australia, 22–24 November 2023. [Google Scholar]
González-Vidal, A.; Rathore, P.; Rao, A.S.; Mendoza-Bernal, J.; Palaniswami, M.; Skarmeta-Gómez, A.F. Missing data imputation with bayesian maximum entropy for internet of things applications. IEEE Internet Things J. 2020, 8, 16108–16120. [Google Scholar] [CrossRef]
Wang, G.; Deng, Z.; Choi, K.S. Tackling missing data in community health studies using additive LS-SVM classifier. IEEE J. Biomed. Health Inform. 2016, 22, 579–587. [Google Scholar] [CrossRef]
Erhan, L.; Di Mauro, M.; Anjum, A.; Bagdasar, O.; Song, W.; Liotta, A. Embedded data imputation for environmental intelligent sensing: A case study. Sensors 2021, 21, 7774. [Google Scholar] [CrossRef]
Liu, S.; Li, X.; Cong, G.; Chen, Y.; Jiang, Y. Multivariate time-series imputation with disentangled temporal representations. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Cini, A.; Marisca, I.; Alippi, C. Filling the g_ap_s: Multivariate time series imputation by graph neural networks. arXiv 2021, arXiv:2108.00298. [Google Scholar]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? Proc. AAAI Conf. Artif. Intell. 2023, 37, 11121–11128. [Google Scholar] [CrossRef]
Graves, A.; Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar] [CrossRef]
Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 2. Short papers. [Google Scholar]
Meulman, J.J. Optimal Scaling Methods for Multivariate Categorical Data Analysis. SPSS White Paper Chicago 1998. Available online: https://www.researchgate.net/profile/Jacqueline-Meulman/publication/268274402_Optimal_scaling_methods_for_multivariate_categorical_data_analysis/links/553625040cf218056e92cab7/Optimal-scaling-methods-for-multivariate-categorical-data-analysis.pdf (accessed on 5 April 2024).

Figure 1. Schematic illustration of the process for converting traffic flow data into a multivariate time series. With D sensors recording data over T time points, aligning the measurements by timestamp produces a set of multivariate sequences. Formally, if

S_{d} = {x_{1}^{d}, x_{2}^{d}, \dots, x_{T}^{d}}

denotes the monitoring sequence of the d-th sensor; the resulting multivariate time series can be expressed as

X = {[S_{1}^{T} S_{2}^{T} \dots S_{D}^{T}]}^{T}

.

Figure 2. Example of a multivariate time series with four dimensions and a sequence length of seven, containing several missing entries. The two sequences on the right correspond to the masking vector

m_{t}

and the time interval vector

γ_{t}

. The computation of

γ_{t}

is based on the observation times of

x_{1}

to

x_{7}

, which are given by

{H_{1}, H_{2}, \dots, H_{7}} = {0, 3, 5, 6, 9, 11, 14}

. The lower panel illustrates the derivation of the interval vector for the second dimension.

Figure 3. Framework of TSFNN. White circles denote missing values, black lines indicate the connections between observed and missing values within each layer, and blue lines depict the links between interpolation results.

Figure 4. Architecture of TSFNN. The diagram on the left depicts the component responsible for capturing temporal features, while the diagram on the right illustrates the component dedicated to capturing spatial features.

Figure 5. Architecture of LSTM unit. In the figure,

h_{t}

represents hidden state,

x_{t}

indicates data,

c_{t}

stands for cell state,

σ

means sigmoid function,

t a n h

stands for tanh function, and ⊙ represents the element-wise multiplication.

Figure 6. The black crosses denote the observed values, the red circles indicate the ground-truth targets for imputation, and the blue curve illustrates the interpolation results.

Table 1. Results of MAE (lower is better). The best result is indicated in bold.

		MEAN	KNN	MRF	RNN	Bi-RNN	TIDER	GRIN	TIMSNET	Transformer	DLinear	TSFNN
PEMS03	30%	86.5960	13.5931	8.8696	20.2834	14.4200	13.6782	22.0279	14.3159	17.0793	17.4783	7.8933
	50%	86.6192	14.5375	10.3390	20.6516	13.1034	14.9002	22.8457	15.1542	17.5090	20.6887	9.2397
	70%	86.6089	16.8791	12.6183	21.3428	13.6127	19.3608	53.8453	16.7718	17.9871	25.4095	12.0002
PEMS04	30%	104.1736	19.5579	17.4829	27.4679	18.6534	24.3212	29.6091	20.6891	23.2935	22.7144	17.0629
	50%	104.4697	21.3425	18.8238	28.0665	19.1650	26.2593	30.7867	21.7101	23.6364	26.5238	18.3877
	70%	104.3353	24.6176	21.4658	29.1995	19.9096	30.5007	52.4402	23.5835	24.0626	32.1466	19.6327
PEMS07	30%	122.9596	18.5717	15.5705	31.4786	22.7469	23.9575	29.6091	22.6178	30.1796	25.5388	13.4006
	50%	122.9921	19.6671	17.8137	31.4512	22.4689	25.2355	24.5671	23.8183	30.4364	30.4100	15.8010
	70%	123.0093	22.6324	20.3918	32.2756	23.0151	29.1904	26.4177	26.2248	30.7599	37.4638	19.0602
PEMS08	30%	88.3220	15.2928	13.5444	22.1556	14.3491	21.4088	32.7162	15.9736	19.9493	18.4736	13.0557
	50%	88.3421	17.4871	15.0658	22.8810	15.2291	24.5186	35.1821	16.9278	20.5979	21.7813	14.6855
	70%	88.3246	23.6218	17.9481	24.4446	16.2238	29.5935	56.3692	18.6909	21.1650	26.6519	16.1812

Table 2. Results of RMSE (lower is better). The best result is indicated in bold.

		MEAN	KNN	MRF	RNN	Bi-RNN	TIDER	GRIN	TIMSNET	Transformer	DLinear	TSFNN
PEMS03	30%	110.1345	21.1594	15.8544	41.4177	22.8208	24.1594	189.5832	22.0651	30.0170	25.4199	14.4309
	50%	110.1671	23.0194	18.4147	41.7044	21.8967	25.8220	197.9745	23.1651	30.6173	30.2082	17.3487
	70%	110.1274	27.3572	22.1713	42.2386	22.7241	32.9966	674.0384	25.5019	31.2884	37.2414	21.7693
PEMS04	30%	128.8029	31.8871	29.9563	51.7645	31.8932	39.4678	184.0275	32.0212	38.1795	33.1983	29.0194
	50%	129.0451	35.1147	32.3576	52.4029	32.4044	40.7105	196.4327	33.1468	38.5641	38.1220	30.8753
	70%	129.0190	40.4698	36.9898	53.4372	33.7128	45.6582	364.8943	35.4149	39.0581	45.6298	33.8815
PEMS07	30%	150.3760	29.1972	26.7799	61.2508	38.4737	38.7567	184.0275	34.9276	49.6261	36.2022	24.9834
	50%	150.4317	31.0891	27.9772	61.4552	38.02741	41.3220	77.0239	36.2026	50.3366	42.6641	26.3939
	70%	150.4381	36.6143	39.3241	62.1628	38.39651	48.1088	95.9151	39.0316	51.0977	52.2234	34.8978
PEMS08	30%	111.1832	24.9221	24.3443	43.5752	23.5005	35.4572	227.4614	24.7047	31.3339	26.5831	22.0545
	50%	111.2393	29.4312	27.5831	44.5003	24.8683	39.8298	247.0275	25.8617	32.5438	31.1323	24.3064
	70%	111.2205	40.2518	33.0293	46.3633	26.8602	44.8765	410.6631	28.1603	33.6863	37.9274	26.5616

Table 3. Ablation experimental results of MAE (lower is better). The best result is indicated in bold.

		TSFNN-t	TSFNN-s	TSFNN-bi	TSFNN
PEMS03	30%	16.4354	13.5469	11.9188	7.8933
	50%	18.5576	14.9184	12.3981	9.2397
	70%	21.3000	15.7012	14.4773	12.0002
PEMS04	30%	24.5247	20.3452	19.7419	17.0629
	50%	27.1700	19.7062	20.8666	18.3877
	70%	30.83740	20.6436	22.9267	19.6327
PEMS07	30%	24.8226	21.4891	18.2585	13.4006
	50%	28.1664	22.4671	20.5230	15.8010
	70%	32.7337	26.6731	25.0748	19.0602
PEMS08	30%	21.9738	15.0392	15.1561	13.0557
	50%	24.6872	16.9425	16.5508	14.6855
	70%	28.1863	17.5790	18.5238	16.1812

Table 4. Ablation experimental results of RMSE (lower is better). The best result is indicated in bold.

		TSFNN-t	TSFNN-s	TSFNN-bi	TSFNN
PEMS03	30%	25.8106	22.6489	20.9601	14.4309
	50%	28.5744	22.7909	20.8367	17.3487
	70%	32.1357	24.0021	22.1713	21.7693
PEMS04	30%	38.4176	31.6457	32.9907	29.0194
	50%	41.4574	32.6511	34.4222	30.8753
	70%	45.8229	34.5772	37.0240	33.8815
PEMS07	30%	39.1665	34.8289	31.7419	24.9834
	50%	43.7269	34.7189	34.7457	26.3939
	70%	49.5546	39.0075	40.3500	34.8978
PEMS08	30%	35.4606	25.4237	25.4066	22.0545
	50%	38.8308	28.5472	27.7381	24.3064
	70%	43.0485	27.4608	31.0172	26.5616

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Spatio-Temporal Recursive Method for Traffic Flow Interpolation

Abstract

1. Introduction

2. Related Work

2.1. Traditional Machine Learning and Statistical Methods

2.2. Deep Learning-Based Methods

3. Framework and Methodology

3.1. Problem Formalization and Preliminaries

3.2. Framework of TSFNN

3.2.1. Temporal Module

3.2.2. Spatial Module

3.2.3. Spatial–Temporal Fusion Module

3.2.4. Loss Function

3.3. Evaluation Metrics

4. Experiment

4.1. Datasets

4.1.1. Dataset Missing Value Manufacturing

4.1.2. Data Normalization

4.2. Experimental Settings

4.2.1. Reproducibility

4.2.2. Baselines

4.3. Main Result

4.4. Ablation Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics