Scale-Fusion Transformer: A Medium-to-Long-Term Forecasting Model for Parking Space Availability

Chen, Jie; Wu, Mengli; Li, Sheng; Cai, Yunyi; Long, Wangchen; Yang, Bo

doi:10.3390/electronics14183636

Open AccessArticle

Scale-Fusion Transformer: A Medium-to-Long-Term Forecasting Model for Parking Space Availability

by

Jie Chen

^1,2,*

,

Mengli Wu

¹,

Sheng Li

¹,

Yunyi Cai

¹,

Wangchen Long

³ and

Bo Yang

^4,*

¹

School of Advanced Interdisciplinary Studies, Hunan University of Technology and Business, Changsha 410205, China

²

Xiangjiang Laboratory, Changsha 410205, China

³

The College of Artificial Intelligence, Zhuhai City Polytechnic, Zhuhai 519090, China

⁴

School of Information Technology and Management, Hunan University of Finance and Economics, Changsha 410205, China

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(18), 3636; https://doi.org/10.3390/electronics14183636

Submission received: 2 August 2025 / Revised: 8 September 2025 / Accepted: 12 September 2025 / Published: 14 September 2025

(This article belongs to the Special Issue Digital Intelligence Technology and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Urban parking spaces are key city resources that directly affect how easily people get around and the quality of their daily travel. Accurately predicting future parking space availability can improve the efficiency of using parking spaces. For instance, it can enhance smart parking applications like shared parking and EV charging scheduling. However, because parking behavior is dynamic and constantly changing, it is challenging to predict parking space availability over the medium-to-long term. This paper proposes a Scale-Fusion Transformer model (SFFormer) to address dynamic changes in parking spaces availability caused by complex parking behaviors, as well as challenges in medium-to-long-term prediction modeling. The three key innovations are as follows: (1) a scale-fusion module integrating short-term and long-term parking trends, (2) an adaptive data compression mechanism for multi-scale prediction tasks, and (3) a Transformer-encoder-based time pattern capturing architecture, which is adaptable to diverse parking lots and long-term prediction scenarios. Experiments on real parking datasets demonstrate that the SFFormer model significantly outperforms state-of-the-art models such as iTransformer, PatchTST, DLinear, and Autoformer.

Keywords:

time-series prediction; medium-to-long-term prediction; parking space availability; Transformer; multi-scale fusion

1. Introduction

Parking prediction, as a core technology to alleviate urban parking challenges and promote intelligent transportation development, plays an irreplaceable role in optimizing resource allocation, enhancing public life quality, and fostering sustainable urban development [1]. In the long term, the primary value of this model lies in optimizing the utilization of parking resources and enhancing urban traffic efficiency. By providing accurate predictions of parking availability, it helps reduce time spent searching for parking and supports better-informed urban management decisions. Whether for short-term better travel experiences or long-term urban ecosystem building, parking prediction is a key link between technological innovation and public welfare and has become an essential part of modern urban governance.

Most parking prediction methods target short-term forecasting [2,3]. Though effective for real-time parking guidance, these models struggle to handle long-term fluctuations amid evolving urban demands. The key distinction is that short-term prediction (1–2 h) addresses immediate needs using high-frequency data to capture instant changes. Medium-/long-term prediction (24+ h to months) integrates short-term fluctuations with sustained trends (e.g., weekly/seasonal patterns) to enable strategic parking planning and resource allocation.

Medium-to-long-term time series forecasting methods can be divided into two categories based on the forecasting horizon. Among them, iterative single-step prediction, achieved by repeating single-step forecasts, relies on previous results, causing earlier errors to accumulate and expand. Due to ignoring long-term temporal dependencies, it is efficient and simple in structure for short-term prediction; however, it is difficult to meet long-term prediction needs [4], as shown in Figure 1. In contrast, medium-to-long-term prediction directly outputs multi-step results, which can avoid error propagation, capture long-term patterns, and have good stability, effectively supporting parking resource allocation and planning. However, this model is more complex, has higher requirements for data quantity and quality, and its real-time performance is also limited [5,6]; this trade-off is also reflected in Figure 1.

Therefore, in parking prediction, iterative single-step prediction is suitable for real-time dynamic adjustment but has significant long-term deviations, whereas medium-to-long-term prediction supports operation planning but at the cost of higher complexity.

The current challenges in medium-to-long-term parking space availability prediction are mainly focused on three aspects: (1) multi-scale temporal features (intra-day fluctuations, weekly/monthly cycles, and long-term trends) are strongly coupled, making it difficult for traditional models to accurately integrate them; (2) the parsing of cross-scale dependencies (e.g., correlations between minute-level and daily level features) is challenging, limiting the modeling of complex temporal patterns; (3) the adaptation to scenario heterogeneity (differences between commercial, residential areas, etc.) is insufficient, leading to poor generalization in long-sequence forecasting. SFFormer achieves targeted breakthroughs: it alleviates multi-scale coupling via a scale-fusion module, parses cross-scale correlations through multi-scale decomposition and fusion, and enhances scenario adaptability with adaptive compression and a Transformer architecture, ensuring stability in long-sequence forecasting. Therefore, this paper proposes a medium- and long-term prediction model for parking space availability, aiming to address the issue of insufficient accuracy in medium-to-long-term parking space prediction. The SFFormer model integrates parking data across multiple temporal scales, capturing dynamic parking availability trends through cross-scale feature fusion to adapt to diverse parking scenarios.

In method innovation within the field of time-series forecasting, existing studies have advanced the field through diverse methodological contributions. Guo et al. [7] proposed a multi-scale recurrent network integrating scale attention and cross-scale guidance for multivariate time series forecasting. This network can capture features at different time scales, focus on key scale information through the scale attention mechanism, and achieve effective fusion of features across different scales with the help of cross-scale guidance, thereby improving forecasting performance. Zhou et al. [8] proposed the Frequency Improved Legendre Memory Model (FiLM), which utilizes Legendre polynomial projection to approximate historical information, employs Fourier projection for denoising, and incorporates low-rank approximation to accelerate computation, effectively enhancing the accuracy of long-term time-series forecasting for both multivariate and univariate data.

These explorations not only enriched the methodological toolkit for time-series forecasting by integrating mechanisms like scale attention and polynomial-frequency hybrid processing but also demonstrated the effectiveness of targeted technical improvements in enhancing long-term prediction performance across diverse scenarios. However, in the broader field of time-series forecasting, despite these advancements, existing studies still face limitations in balancing the capture of long-term temporal dependencies with robustness to high-noise data, especially when handling complex nonlinear patterns in extended forecasting horizons. This constrains their further application in high-stakes domains requiring both accuracy and reliability over prolonged periods.

To address modeling challenges arising from complex urban parking behaviors—particularly in dynamic-space-variation prediction and mid-to-long-term forecasting—this study proposes the SFFormer, which significantly enhances prediction accuracy through a long-short period scale-fusion module. Its key innovations manifest three aspects:

(a): Multi-scale feature decomposition: Time-series patching decomposes parking data into short-period local features and long-period global patterns. Dimensional alignment and fusion are achieved via Adaptive Average Pooling (AAP), overcoming traditional single-scale models’ limitations in concurrent multi-scale pattern extraction.
(b): Transformer-based temporal modeling: A Transformer-encoder architecture processes scale-fusion features with adaptive average pooling. This framework captures complex temporal dependencies while maintaining adaptability across parking facilities and supporting extended forecasting horizons.
(c): Task-adaptive compression: An adaptive data compression mechanism handles potential multi-task demands in smart parking services. Techniques like adaptive average pooling enhance flexibility for diverse prediction tasks (24–720 h/1–30 days), enabling applications ranging from real-time parking guidance to long-term infrastructure planning.

2. Related Work

As a key component of intelligent transportation systems, parking prediction has achieved significant advancements in model architecture and feature fusion in recent years. Existing studies widely employ machine learning and deep learning methods, focusing not only on integrating geospatial information but also on exploring innovations in spatiotemporal sequence modeling techniques—ranging from recurrent neural networks such as LSTM and GRU, to spatiotemporal graph neural networks like ST-GRAT, and further to Transformer-based models including Informer [9] and Autoformer [10]. These diverse methods have effectively enhanced prediction performance. Meanwhile, multi-factor fusion, dynamic heterogeneity modeling, and model efficiency optimization have become research priorities, providing multiple approaches to address core issues in parking data, such as spatiotemporal correlation, periodicity, and adaptability to complex scenarios. Furthermore, the critical importance of explicitly addressing temporal scale disparities extends beyond the domain of predictive modeling. In resource management, particularly for Green Internet of Things (IoT) networks, the dual time-scale resource management strategy proposed by Zhang et al. [11] has demonstrated significant advantages. This approach enhances resource efficiency and reduces energy consumption by decoupling and coordinating decisions operating across distinct temporal scales. This evidence underscores the broader applicability and value of multi-scale temporal analysis as a fundamental paradigm for addressing complex, dynamic system challenges, further validating the rationale for our multi-scale feature fusion methodology in medium-to-long-term parking availability forecasting.

Balmer et al. [12] integrated geospatial information with random forest and neural networks to enhance roadside parking occupancy prediction accuracy. The proposed model outperformed conventional approaches lacking geospatial data, thereby facilitating smarter urban parking management. Feng et al. [13] proposed a densely connected ConvLSTM model for parking availability prediction. By leveraging convolutional operations, the model captures spatial correlations across parking zones, while the LSTM structure models temporal dependencies. The dense connections further enhance feature propagation and alleviate gradient vanishing, resulting in improved zone-level parking space availability forecasting. Zhou et al. [14] combined Transformer with frequency-enhanced seasonal-trend decomposition. Fourier transform was utilized to reduce computational complexity, enabling the Transformer to capture global long-term features with linear sequence complexity—thereby addressing limitations of traditional Transformers. Wu et al. [15] introduced TimesNet, which employs Fourier analysis to extract multi-periodicity from sequences. The method transforms 1D time-series into 2D tensors and aggregates features to resolve challenges in modeling long-term dependencies and temporal relationships within complex multi-period data, demonstrating strong performance in time-series tasks. Park et al. [16] proposed ST-GRAT, incorporating spatial-temporal attention modules and a spatial sentinel module. This design addresses limitations of traditional models in handling dynamic spatial relationships, capturing long-term correlations, and incorporating graph structures, leading to improved traffic speed prediction accuracy. Liu et al. [17] integrated graph-structured traffic network information into large language models (LLMs) to enhance the accuracy and generalization of traffic prediction. The proposed ST-LLM+ explicitly models spatial dependencies between traffic nodes and dynamic temporal patterns, overcoming the limitations of conventional models in capturing complex spatio-temporal correlations. Ju et al. [18] created COOL, a joint spatio-temporal graph neural network. This model integrates high-order interactions through heterogeneous graphs, fuses node semantic information, and employs a multi-scale decoder to capture traffic pattern diversity, significantly improving long-term prediction accuracy. Jin et al. [19] proposed TransGTR, a transferable graph structure learning framework for cross-city traffic forecasting, which jointly learns and transfers graph structures and forecasting models via city-agnostic node features and temporal decoupled regularization. The framework dynamically captures spatio-temporal dependencies and aligns spatial feature distributions between cities, enabling accurate predictions in data-scarce urban networks. The CRINet model proposed by Feng et al. [20] effectively enhances the accuracy of short- and long-term predictions for multi-pollutant concentrations through its correlation-split and recombination-sort strategies. Its feature interaction approach for multivariate time series provides valuable insights for the design of the scale-fusion module in SFFormer.

3. Model Description

3.1. Problem Definition

This study focuses on medium-to-long-term forecasting of available parking spaces, formally defined as predicting future numerical values

L : (x_{1}, \dots, x_{L})

for

T

time steps based on a given univariate historical sequence. Specifically, a lookback window of length

L

(576-time steps, equivalent to 2 days of historical data) serves as input to forecast parking space availability for 24 to 720 future steps (corresponding to 2 h to 2.5 days), as illustrated in Figure 2. To address this task, the SFFormer architecture was designed, utilizing an efficient Transformer encoder as its core component. The model structure is depicted in Figure 3.

3.2. SFFormer Model Architecture

The SFFormer model is a deep learning framework for time-series forecasting. Its core idea is to capture spatio-temporal dependencies via multi-scale feature fusion and a Transformer encoder, enhancing medium-to-long-term prediction accuracy. The input is a historical time series

x^{(i)} \in R^{1 \times L}

. After normalization, scale fusion, and Transformer encoding, the prediction result

y^{(i)} \in R^{1 \times T}

is output via a linear head. Typical applications include parking space availability forecasting and other time-series prediction tasks.

Patching operation: The input univariate time series

x^{(i)} \in R^{1 \times L}

is segmented into potentially overlapping patches. Here,

L

denotes the input sequence length, while

i

represents the sample index. The patch length is defined as

P

with S indicating the stride between consecutive patches. This patching process yields a patch sequence

x_{p}^{(i)} \in R^{P \times N}

, where

N

corresponds to the number of patches.

In the scale-fusion module, information mining was performed on the input data

x^{(i)}

to integrate extracted short-term and long-term scale features. A patching operation was first applied to partition

x^{(i)}

, extracting local features to generate

x_{p_{s h o r t}}^{(i)} \in R^{P_{s h o r t} \times N_{s h o r t}}

,where

P_{s h o r t}

denotes the short patch length and

N_{s h o r t}

represents the number of short-term patches. This output encapsulates shorter period information. Concurrently, the identical operation was reapplied to extract global features, capturing macroscopic long-period information and thereby forming

x_{p_{l o n g}}^{(i)} \in R^{P_{l o n g} \times N_{l o n g}}

.

To achieve scale fusion, AAP was employed to adjust the dimensionality of

x_{p_{l o n g}}^{(i)}

, aligning it with

x_{p_{s h o r t}}^{(i)}

in the feature space, as follows:

x_{p_{f i n a l}}^{(i)} = A A P (x_{p_{l o n g}}^{(i)}) + x_{p_{s h o r t}}^{(i)}

(1)

This step ensures the effective fusion of

x_{p_{s h o r t}}^{(i)}

and

x_{p_{l o n g}}^{(i)}

, preserving long-term information in

x_{p_{l o n g}}^{(i)}

, reducing redundancy, and yielding compact, efficient fused features. The adaptively average-pooled

x_{p_{l o n g}}^{(i)}

is then added to

x_{p_{s h o r t}}^{(i)}

generating

x_{p_{f i n a l}}^{(i)} \in R^{P_{s h o r t} \times N_{s h o r t}}

.

x_{p_{f i n a l}}^{(i)}

inherits short-term scale information from

x_{p_{s h o r t}}^{(i)}

while incorporating the global perspective of

x_{p_{l o n g}}^{(i)}

, providing comprehensive data support for subsequent time-series prediction. This module integrates short- and long-period scale information, enhancing the model’s ability to analyze complex-period scale information and complex time-series data, thereby ensuring prediction accuracy and robustness.

Transformer encoder: In the model we proposed, a Transformer encoder is used. It can map the observed signals to potential representations. Through a trainable linear projection

W_{p} \in R^{D \times P}

,

x_{p_{f i n a l}}^{(i)}

is mapped to the

D

-dimensional potential space of the Transformer, and a learnable additional position encoding

W_{p o s} \in R^{D \times N}

is applied to monitor the temporal order of

x_{p_{f i n a l}}^{(i)}

. The final input to the Transformer encoder is defined as:

x_{d}^{(i)} = W_{p} x_{p_{f i n a l}}^{(i)} + W_{p o s}

, where

W_{p}

is the trainable projection matrix,

D

denotes the dimensionality of the latent space,

P

is determined by the patch length of

x_{p_{f i n a l}}^{(i)}

, and d indicates the index of the input feature to the Transformer encoder.

In multi-head attention, each head

h = 1, \dots, H

generates a query matrix

Q_{h}^{(i)} = (x_{d}^{(i)})^{T} W_{h}^{Q}

, a key matrix

K_{h}^{(i)} = (x_{d}^{(i)})^{T} W_{h}^{K}

, and a value matrix

V_{h}^{(i)} = (x_{d}^{(i)})^{T} W_{h}^{V}

. Here,

W_{h}^{Q}

,

W_{h}^{K}

, and

W_{h}^{V}

denote the query, key, and value matrices of the

h

-th attention head, respectively, with

W_{h}^{Q}

,

W_{h}^{K} \in R^{D \times d_{k}}, W_{h}^{V} \in R^{D \times D}

,

d_{k}

(dimension of

Q

and

K

) satisfying

d_{k} = \frac{D}{H}

(

H

is the number of heads). The scaled dot-product operation is then performed to obtain the attention output

O_{h}^{(i)} \in R^{D \times N}

. Here,

h

represents the

h

-th attention head,

N

is the sequence length dimension, and

D

is the Transformer’s latent space dimension.

The multi-head attention module incorporates a batch normalization layer and a feed-forward network with residual connections (Figure 3). It generates a representation

Z_{i} \in R^{D \times N}

. The prediction result

y^{(i)} \in R^{1 \times T}

is obtained via a flattening layer and a linear head. Here,

T

denotes the prediction length (future time steps),

\hat{x}

is the predicted value,

L + 1

is the first future prediction step, and

L + T

is the

T

-th future step (with

L

as historical data length):

y^{(i)} = F l a t t e n ((O_{h}^{(i)})^{T} + L i n e a r H e a d) = ({\hat{x}}_{L + 1}^{(i)}, \dots, {\hat{x}}_{L + T}^{(i)})

(2)

Normalization: To effectively handle the non-stationary characteristics inherent in the parking data (as identified in Section 4.1.1), SFFormer employs a crucial instance normalization strategy that is applied to each time-series sample independently; specifically, each time-series instance

x^{(i)}

is standardly normalized to have zero mean and unit standard deviation before patching, with its mean and standard deviation reintroduced to the output prediction. This process is applied to each time-series sample independently and consists of two stages:

(a): Input normalization: At the input stage, each time-series sample undergoes normalization. During this procedure, the series’ mean is first computed and subtracted to eliminate the overall trend. Then, the series’ standard deviation is calculated and used to divide the series, which serves to unify its volatility. In the end, each input series is transformed into a standard form with a zero mean and unit standard deviation. This allows the model to focus more on learning the universal scale-independent patterns embedded in the data.
(b): Output de-normalization: At the output stage, an inverse operation is performed to restore predicted values to their original data scale. To do this, the model’s standardized prediction output is first multiplied by the previously stored standard deviation (to recover the series’ volatility), and then the previously stored mean is added (to restore the baseline level). Through this de-normalization step, the model’s output is converted from the standardized space back to the original numerical values that hold real-world physical meaning.

The instance-wise normalization and de-normalization mechanism is a key design choice that grants SFFormer robustness against the trends and scale shifts commonly found in real-world time series, enabling accurate forecasting without the need for manual data differencing.

4. Results

In this section, comprehensive experiments were conducted to compare the proposed method with industry-leading prediction approaches. These experiments used the same dataset and evaluation criteria. Subsequently, a detailed analysis of the experimental results was carried out to demonstrate the optimizations and improvements of our solution compared to other alternatives.

4.1. Experimental Data and Data Analysis

The dataset used contained parking occupancy data from 29 parking lots in Guangzhou, China, spanning from 1 June to 30 June, 2018. It had a minimum resolution of five minutes. These parking lots included nine commercial building lots (C1–C9), three hospital lots (H1–H3), five office building lots (O1–O5), three sports and entertainment facility lots (S1–S3), four tourist attraction lots (T1–T4), and five residential area lots (R1–R5). The dataset was sourced from Li et al. [21].

A heat map is a visualization tool that intuitively displays data distribution density, frequency, or intensity through color gradients. In parking management, color gradients enable intuitive representation of parking space occupancy density or time-based usage frequency across different areas of a facility. This allows managers to monitor real-time parking space distribution trends, quickly identify high-activity hotspots or long-term idle zones, optimize parking guidance strategies, assist in formulating time-based pricing schemes, and provide visual data support for long-term planning (e.g., parking lot expansion and charging station deployment). Consequently, it enhances parking resource utilization and improves user parking experience.

The multidimensional parking occupancy heatmap for the C1 area (Figure 4) illustrates temporal patterns through a color gradient ranging from deep blue (indicating low occupancy) to deep red (signifying high occupancy). Dates spanning 1–30 June (e.g., 1 June [Thursday]) are displayed along the vertical axis, while hourly intervals from 00:00 to 23:00 appear on the horizontal axis. Nighttime periods (00:00–06:00) exhibited sustained low occupancy (deep blue); weekends maintained predominantly low-occupancy characteristics, but special events triggered anomalous fluctuations, such as demand surges or drops. This visualization revealed distinct diurnal and weekly periodicities alongside short-term anomalies, demonstrating the multiscale nature of parking behavior. Consequently, the analysis underscored the critical importance of capturing multi-timescale patterns for accurate occupancy prediction, providing essential visual evidence for understanding parking dynamics.

The comparative line chart illustrating parking demand variations across different date types in Area C1 shows the hourly changes (from 00:00 to 20:00) in parking occupancy rates on a weekday (Monday, 5 June), a weekend (Saturday, 3 June), and a special event day (Sunday, 11 June). The horizontal axis represents hours, while the vertical axis indicates parking occupancy rates. Three distinct trend lines capture the demand patterns associated with each date type. On weekdays, parking demand displays a clear twin-peak pattern corresponding to morning and evening commuting hours. In contrast, the weekend curve is relatively flat, with peaks occurring later in the day and showing lower intensity. Special event days exhibit irregular fluctuations with no consistent pattern, including atypical peaks. This chart intuitively highlights significant differences in parking behavior across date types, demonstrating that parking demand is influenced by factors such as commuting routines, leisure activities, and special events. It underscores the complexity and variability of parking dynamics and provides a basis for understanding and forecasting parking demand under diverse conditions. Consequently, it emphasizes the necessity for predictive models to capture varying temporal patterns, as illustrated in Figure 5.

Parking demand prediction faces multi-level and complex challenges. Heatmap visualizations clearly reveal that parking demand demonstrates significant multi-scale temporal periodicity—such as intra-day peaks and troughs, variations between weekdays and weekends, and seasonal fluctuations—as well as strong spatial correlations. Although short-term models can respond quickly to instantaneous demand, they often fail to capture long-term trends. The evident heat islands and cold spots in the heatmaps visually reflect the intertwined effects of these complex factors. Therefore, prediction models are required not only to be sensitive to short-term fluctuations but also to retain memory of long-term patterns. In addition, breakthroughs are needed in areas such as multi-source data fusion, spatial correlation modeling, and anomaly detection. Only through these advances can accurate parking demand prediction be achieved, providing reliable support for the development of smart cities.

4.1.1. Stationarity Analysis

In addition to observing cyclical patterns, a formal analysis of the data’s underlying statistical properties is crucial for robust model development. To this end, we first selected a subset of datasets (C1–C8) from all our parking datasets, and then conducted a comprehensive stationarity analysis on the raw parking occupancy time series of these selected datasets using two complementary statistical tests: an augmented Dickey–Fuller (ADF) test (a common method to detect unit roots, where a non-significant p-value indicates stationarity) and the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test (a test for trend stationarity, where a non-significant p-value confirms stationarity around a deterministic trend), as shown in Table 1.

The stationarity of the time series in each dataset were jointly verified via the ADF test (null hypothesis: existence of a unit root, i.e., non-stationary) and the KPSS test (null hypothesis: stationary). In the ADF test, the p-values of all datasets are far less than 0.05, strongly rejecting the null hypothesis of “non-stationary”. In the KPSS test, except for C3 (with a p-value of 0.01 < 0.05, thus determined as non-stationary), the p-values of other datasets (e.g., C1 returns a p-value of 0.1, where the actual p-value is larger as the test statistic exceeds the valid range of the lookup table) are all greater than 0.05, so the null hypothesis of “stationary” cannot be rejected. The results of the two tests mutually confirm that, except for C3, the time series of other datasets all satisfy stationarity. Moreover, some returned values in the KPSS test only reflect the lower bound due to the limitation of the lookup table range, which actually supports the stationarity conclusion more, making the overall judgment reliable.

The results, as shown in Figure 6, indicate that most parking data exhibits stationary characteristics: among the eight time series (C1–C8) analyzed, the ADF test identifies only one series (C3) as non-stationary, while the remaining seven series (C1, C2, C4–C8) are deemed stationary. The KPSS test further validates these findings: C3 fails the KPSS test, whereas C1, C2, and C4–C8 all pass the test, reaffirming their stationary nature.

However, the existence of non-stationary cases (e.g., C3) still poses a challenge for time-series forecasting models that implicitly assume stationary data—this underscores the need for approaches robust to such characteristics, which further justifies the design rationale of our SFFormer model in handling data with mixed stationary and non-stationary properties.

Notably, rather than relying on external pre-processing techniques like differencing, which can sometimes obscure important information, our SFFormer model is intrinsically designed to handle such non-stationary data. This capability is realized through a specific instance normalization mechanism, which is detailed in the model architecture description in Section 3.2. By normalizing each input sample independently, the model can focus on learning the shape and patterns of the time series, making it robust to the trends and scale shifts inherent in the data.

4.2. Performance Comparison

4.2.1. Experimental Results and Analysis

Our proposed SFFormer model outperforms the baseline methods in most prediction tasks. Table 2 presents the prediction results for the number of available parking spaces on the C1 parking lot dataset, with prediction horizons ranging from 24 to 720 steps. Only the results for the C1 dataset are shown in Table 2. Compared with the best-performing Transformer-based baseline model, SFFormer continues to demonstrate superior performance. Moreover, in comparison with non-Transformer models such as DLinear [22], TimesNet [15], and Koopa [23], SFFormer consistently maintains a performance advantage across most evaluation scenarios.

The research findings indicate that the SFFormer model demonstrates substantial advantages across multiple datasets and prediction horizons. On the C1 dataset, the growth rate of MSE in long-term forecasting is significantly lower than that of iTransformer [24], while PatchTST [25] and DLinear [22] each perform well in short-term prediction. Figure 7 illustrates MSE comparisons among four models—SFFormer, PatchTST, iTransformer, and DLinear—under different prediction horizons for the C1 dataset, presenting only the results of the C1 dataset.

In this study, Mean Absolute Error (MAE) and Mean Squared Error (MSE) are employed as primary evaluation metrics to comprehensively assess prediction performance. MAE calculates the average absolute deviation between predicted and actual values. It intuitively reflects the overall error magnitude and remains robust to outliers, making it suitable for evaluating general prediction stability. MSE, on the other hand, squares the prediction errors to amplify the influence of large deviations. This characteristic is particularly valuable for long-term parking space forecasting, where significant errors may severely impact resource scheduling. Together, these two metrics provide a balanced evaluation of model accuracy by capturing both average deviation and extreme errors, thereby ensuring reliable performance across various prediction horizons.

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(3)

M S E = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

(4)

where

n

denote the number of samples,

y_{i}

the true value of the

i

-th sample,

{\hat{y}}_{i}

the predicted value of the

i

-th sample and

| \cdot |

the absolute value operator.

The analysis shows that, when the C1 parking lot dataset is used, our proposed SFFormer model achieved optimal MSE metrics in tasks with prediction horizons of 24, 48, 96, 144, 216, 288, and 720 steps. Notably, only in the 576 step prediction task did the PatchTST model outperform SFFormer (PatchTST’s MSE of 0.0945 was marginally lower than SFFormer’s 0.0950). Across all prediction horizons, SFFormer demonstrated robust consistency: it ranked first in MSE for 7 of the 8 tasks, with the 576 step task as the only exception (ranking second). Table 2 presents a comparison of experimental results for available parking space prediction on the C1 dataset, where boldface indicates the optimal result and underlining indicates the sub-optimal result.

In the comparison of MSE on the C1 dataset, the SFFormer model demonstrates significant long-term prediction advantages as the prediction horizon increases. Through the scale-fusion module and Transformer-encoder architecture, the model intuitively validates its ability to effectively capture temporal features and its superior prediction accuracy in medium-to long-term parking availability prediction tasks. Figure 7 visually presents the comparison of prediction performance among the four models across different prediction horizons on the C1 dataset.

From an overall perspective, as the prediction step size increases, the SFFormer model’s prediction curve tends to align more closely with the ground truth. It successfully captures the variation patterns in parking space occupancy across different step sizes, demonstrating the model’s effectiveness and adaptability in multi-step forecasting for parking lot C1. These results validate the model’s capability to consistently track data fluctuations and fit real-world trends in single-parking-lot prediction tasks, further reinforcing the performance advantages discussed in this study.

4.2.2. Baseline Model and Experimental Setup

To identify the optimal solution for forecasting available parking-space time series, a set of advanced Transformer-based models was selected as baseline comparisons. These included PatchTST [25], Autoformer [10], the non-Transformer architecture DLinear [22], and the recently proposed iTransformer [24]. Each of these models offers distinct architectural advantages, providing diverse and complementary perspectives for comparative evaluation.

To ensure the fairness and reproducibility of the experimental results, a unified experimental setup was adopted for all comparisons. For the four distinct parking lot datasets, experiments were conducted under multiple prediction horizons, with

T \in {24,48,96,144,216,288,576,720}

, corresponding to durations of 2 h, 4 h, 8 h, 12 h, 18 h, 1 day, 2 days, and 2.5 days, respectively. These settings simulate forecasting requirements across various temporal scales. A lookback window of 576-time steps (equivalent to 2 days) was applied to ensure that the models could adequately capture historical patterns. For model performance evaluation, two widely accepted time series forecasting metrics were employed. MSE quantifies the average squared deviation between predicted and actual value. By jointly analyzing these two metrics, a more comprehensive assessment of each model’s performance in time series forecasting tasks was achieved.

To analyze the impact of the key hyperparameter

p_{l}

on the experimental results, dataset C1 was selected as a representative case. On this dataset, the key hyperparameters

p_{s}

and

p_{l}

were varied, and multi-step predictions ranging from 2 to 720 were evaluated using MSE. It was observed that prediction errors were lower when the lengths of

p_{s}

and

p_{l}

formed either a multiple or a complementary relationship—such as

p_{s} = 48

and

p_{l} = 288

, or

p_{s} =

144 and

p_{l} =

576. The corresponding results are presented in Table 3 and Table 4.

In the SFFormer model, the optimal configurations of

p_{s}

and

p_{l}

exhibit significant scale-matching patterns with the prediction steps. As seen in Table 3 and Table 4 (comparative results of hyperparameter adjustment for the C1 dataset): when

p_{l}

is fixed, short-term predictions (e.g., 24 steps) are suitable for medium or smaller

p_{s}

, while long-term predictions (e.g., 720 steps) require a larger

p_{s}

to capture multi-day trends. Meanwhile, changes in

p_{l}

regulate the optimal selection of

p_{s}

, and the two need to co-adapt to balance short-term fluctuations and long-term patterns. This pattern confirms the adaptability of the scale-fusion mechanism in complex parking time-series patterns, providing a basis for hyperparameter selection.

The MSE metrics across varying combinations of

p_{s}

and

p_{l}

under different prediction horizons (24–720 steps) demonstrate significant scale alignment patterns in Figure 8.

The optimal

p_{s}

value exhibits an overall upward trend with increasing prediction length: short-term predictions (24/48 steps) favor moderate or smaller

p_{s}

values (16/32/48), whereas long-term predictions (576/720 steps) require larger values (96/144), reflecting the model’s reliance on broader short-term segments to capture long-term trends.

This relationship is further influenced by

p_{l}

, which dynamically regulates the selection of the optimal

p_{s}

, thereby demonstrating multi-scenario adaptability. When

p_{l}

is small, short-term predictions converge is at

p_{s} = 48

, while long-term predictions shift to 96. At

p_{l} = 288

, medium-to-long-term predictions (288/576 steps) prefer

p_{s} = 32 / 48

, indicating synergy between medium-scale short-term patterns and extended cycles. At the maximum

p_{l} = 576

, ultra-long-term predictions (720 steps) achieve peak performance at

p_{s} = 144

, forming a long-period–wide-short matching pattern. Suboptimal results consistently neighbor optimal

p_{s}

values, while extreme configurations (excessively large/small) cause significant error degradation, validating SFFormer’s long-short period fusion mechanism.

By dynamically adjusting the scale-fusion parameters, the model can accurately adapt to diverse forecasting horizons, achieving a balance between capturing short-term fluctuations and modeling long-term trends, thereby enhancing prediction accuracy. This scale alignment and dynamic coordination are visually manifested through the consistently lower MSE of optimal parameter combinations compared to suboptimal or invalid configurations. Experiments fully validate the adaptability of the scale-fusion module to complex parking time-series characteristics, providing interpretable regularities for hyperparameter optimization and ensuring a balance between performance and generalization ability in multi-timescale tasks.

4.2.3. Ablation Experiment

To verify the reliability of the proposed Scale-Fusion Transformer model with long-short period (SFFormer), the C1 parking dataset was used as an example. The task focused on predicting the number of available parking spaces from 24 to 720 steps, with MSE as the performance metric. A comparison was made between the scale-fusion model with long-short period and conventional models in which the main modules of this model were removed. As shown in Table 5, the SFFormer module significantly impacted the overall performance of the model, validating its effectiveness.

To validate the effectiveness of each key component in our model, we conducted comprehensive ablation studies on the C1 dataset, as summarized in Table 4. The results demonstrate that the SFFormer model, which integrates both long- and short-period mechanisms, significantly outperforms the single-scale baseline in multi-step prediction tasks for parking lot C1, achieving an average performance improvement of 20.89%. This clearly underscores the superiority of the proposed approach.

Specifically, in the configuration without

p_{l}

, the model loses its ability to capture global temporal patterns, resulting in a noticeable degradation in performance. Conversely, when

p_{s}

which is responsible for capturing fine-grained local variations and short-term fluctuations, is removed, the model retains some capacity to perceive long-term trends but exhibits a clear decline in prediction accuracy due to the lack of detailed modeling. Furthermore, in the setting without AAP, where we replace the AAP module with a simple feature concatenation while retaining the dual-branch structure, the model still benefits from multi-scale features but is limited by the absence of an adaptive fusion mechanism, leading to suboptimal performance.

The complete SFFormer model effectively captures multi-scale features through the

p_{l}

and

p_{s}

branches and intelligently integrates them via the AAP module, achieving the best performance across all evaluation metrics. These results not only quantitatively affirm the advantage of SFFormer over the single-scale model but also validate the efficacy of long-short period fusion in improving prediction accuracy and stability. Overall, the ablation study strongly demonstrates the necessity and effectiveness of the proposed multi-scale architecture and adaptive fusion strategy.

4.2.4. Error Distribution Characteristics Experiment

In the error distribution experiment, six dataset categories from different functional scenarios were selected. These included commercial areas (C1–C9), residential areas (H1–H3), office areas (O1–O5), leisure venues (R1–R5), transportation hubs (S1–S3), and special functional areas (T1–T4). These categories cover typical parking demand scenarios in cities. All data are derived from historical parking space occupancy records of real parking lots. Based on this dataset, we calculated the MSE metrics under different forecasting horizons ranging from 24 h to 720 h (2.5 days). This experiment aimed to visualize the prediction error distribution of the same model across diverse datasets using boxplots. It also sought to verify the model’s adaptability to parking demands in different functional areas and to assess its generalization performance and stability in complex scenarios. The detailed error distributions are presented in Figure 9.

This experiment employed boxplots to intuitively present the error characteristics of the SFFormer model across different dataset types (as shown in Figure 9). The shorter box length observed in commercial areas and transportation hubs indicated smaller fluctuations in parking demand prediction errors. This finding reflects the model’s ability to capture parking behaviors characterized by strong periodicity and clear regularity. In contrast, the more dispersed boxes—including outliers—in residential areas and leisure venues suggested higher randomness in non-commuting parking demand. These results imply that the model’s errors were influenced by short-term disturbances within these contexts. Lower median errors were demonstrated in office and special functional areas, indicating better overall prediction accuracy. Performance bottlenecks—such as extreme errors occurring in leisure venues—were identified through the comparison of error distributions across scenarios. This analysis provides directions for optimization; for example, the model’s capacity to capture short-term fluctuations in residential areas requires improvement. Overall, the boxplot analysis systematically verified the generalization ability and scenario adaptability of the SFFormer model. Stable performance was observed in areas with significant demand patterns, such as commercial districts, office zones, and transportation hubs. However, further improvements were deemed necessary in scenarios exhibiting large demand fluctuations. This analysis quantified scenario-specific performance differences and supports targeted model enhancements. Such enhancements will facilitate the accurate application of the SFFormer model in urban parking resource scheduling.

5. Discussion

The SFFormer model exhibits significant advantages in medium-to-long-term parking space prediction. Its core mechanism lies in the effective integration of short-term fluctuations and long-term trends through a scale-fusion module for long- and short-term patterns; the reduction in redundant information via an adaptive data compression mechanism; and the enhancement of the model’s ability to capture long-sequence features by combining with a Transformer encoder. On the C1 dataset, for prediction tasks with 24 to 720 steps, SFFormer achieves a 6.0–53.0% reduction in MSE compared to iTransformer [24]. These integrated designs—scale-aware feature fusion, adaptive redundancy reduction, and Transformer-driven long-sequence encoding—synergistically optimize prediction accuracy and computational efficiency, thereby achieving a robust balance between these two critical metrics in time-series forecasting tasks.

The model supports intelligent parking management strategies such as dynamic pricing, time-shared parking, and particularly EV charging scheduling by accurately forecasting parking space availability 2 h to 2 days in advance. In terms of EV charging scheduling, the precise prediction of parking space availability provided by SFFormer can be further combined with intelligent charging path optimization methods. For instance, Ren et al. [26] proposed the Intelligent Charging Scheme Maximizing the Quality Utility (ICMQU), which optimizes the charging paths of mobile chargers by comprehensively considering both the data quantity and quality of sensing nodes, and balances the workload among multiple chargers. When combined with SFFormer’s accurate prediction of parking space availability, such synergy not only enhances the energy replenishment efficiency of electric vehicles in parking lots but also contributes to the overall optimization of resources.

However, the model still has several limitations. These limitations stem primarily from its reliance on learning patterns from historical time-series data alone. For example, the model does not account for real-world feedback loops; interventions like time-dependent parking fees or dynamic guidance systems can actively alter parking patterns, introducing complexities that our current auto-regressive model does not capture. This inherent focus on past patterns also defines the model’s performance boundaries: while SFFormer excels in regular time-series forecasting tasks with multi-periodicity, it has limitations in responding to highly random and sudden anomalous events that have no precedent in the training data. This is in addition to other constraints like restricted data coverage and the lack of modeling for spatial correlations.

Future research should explore integrating multi-source urban dynamics—such as public sentiment fluctuations extracted via large-scale information fusion [27]—to enhance prediction robustness against anomalous events and improve dynamic reliability assessment. To address the aforementioned limitations, a crucial future direction is to extend our framework into a multivariate model that incorporates external event variables or an integrated anomaly detection module, thereby improving resilience against unexpected disruptions. This would involve multi-source data fusion to incorporate dynamic external variables—such as real-time pricing and traffic guidance data—as explicit inputs, enhancing the model’s real-world applicability. Additionally, it should focus on multi-source data fusion, cross-domain generalization, lightweight deployment, and enhanced interpretability. The model’s multi-scale feature fusion strategy offers methodological insights for urban dynamic resource management, and further studies may extend this framework to spatio-temporal joint modeling—constructing a three-dimensional time–space–feature prediction architecture to advance the refinement of smart city management.

6. Conclusions

This paper proposes the SFFormer model, which effectively captures long- and short-term temporal characteristics of time-series data through a scale-fusion module and Transformer-encoder architecture. The model decomposes parking-space time series into multi-scale feature representations, adapts to multi-task demands via adaptive data compression, and enhances long-sequence prediction capabilities using a Transformer encoder. Experiments on real parking lot datasets demonstrate that SFFormer outperforms state-of-the-art models—including, Autoformer [10], DLinear [22], iTransformer [24] and PatchTST [25]—in medium-to-long-term prediction tasks. Notably, it achieves optimal performance at prediction lengths of 24–720 steps (2 h to 2.5 days). The model’s fusion of short-term fluctuations with long-term trends provides a robust solution for intelligent parking resource management, thereby optimizing space allocation and supporting urban traffic planning.

Author Contributions

Conceptualization, J.C. and B.Y.; Methodology, J.C. and S.L.; Software, M.W., S.L. and Y.C.; Writing—Original Draft, M.W. and S.L.; Writing—Review and Editing, J.C. and M.W.; Supervision, W.L. and B.Y.; Project Administration, J.C. and B.Y.; Funding Acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant number 62202160 and the Major Program Project of Xiangjiang Laboratory under grant number 24XJJCYJ01001, with J. C. as the funder overseeing these grants.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SFFormer	Scale-Fusion Transformer
AAP	Adaptive Average Pooling
ADF	Augmented Dickey–Fuller
KPSS	Kwiatkowski–Phillips–Schmidt–Shin
MSE	Mean Squared Error
MAE	Mean Absolute Error
$p_{s}$	$p a t c h_{s h o r t}$
$p_{l}$	$p a t c h_{l o n g}$

References

Zhang, L.; Wang, B.; Zhang, Q.; Zhu, S.; Ma, Y. Parking Lot Traffic Prediction Based on Fusion of Multifaceted Spatio-Temporal Features. Sensors 2024, 24, 4971. [Google Scholar] [CrossRef] [PubMed]
Wang, T.; Li, S.; Li, W.; Yuan, Q.; Chen, J.; Tang, X. A Short-Term Parking Demand Prediction Framework Integrating Overall and Internal Information. Sustainability 2023, 15, 7096. [Google Scholar] [CrossRef]
Shang, K.; Wan, Z.; Zhang, Y.; Cui, Z.; Zhang, Z.; Jiang, C.; Zhang, F. Intelligent Short-Term Multiscale Prediction of Parking Space Availability Using an Attention-Enhanced Temporal Convolutional Network. ISPRS Int. J. Geo-Inf. 2023, 12, 208. [Google Scholar] [CrossRef]
Bontempi, G.; Birattari, M.; Bersini, H. Local Learning for Iterated Time-Series Prediction. In Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, 27–30 June 1999; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1999; pp. 32–38. [Google Scholar]
Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial-Temporal Synchronous Graph Convolutional Networks: A New Framework for Spatial-Temporal Network Data Forecasting. Proc. AAAI Conf. Artif. Intell. 2020, 34, 914–921. [Google Scholar] [CrossRef]
Zhang, W.; Liu, H.; Liu, Y.; Zhou, J.; Xiong, H. Semi-Supervised Hierarchical Recurrent Graph Neural Network for City-Wide Parking Availability Prediction. Proc. AAAI Conf. Artif. Intell. 2020, 34, 1186–1193. [Google Scholar] [CrossRef]
Guo, Q.; Fang, L.; Wang, R.; Zhang, C. Multivariate Time Series Forecasting Using Multiscale Recurrent Networks with Scale Attention and Cross-Scale Guidance. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 540–554. [Google Scholar] [CrossRef] [PubMed]
Zhou, T.; Ma, Z.; Wang, X.; Wen, Q.; Sun, L.; Yao, T.; Yin, W.; Jin, R. FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting. Adv. Neural Inf. Process. Syst. 2022, 35, 12677–12690. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Advances in Neural Information Processing Systems 34, Proceedings of the Annual Conference on Neural Information Processing Systems 2021 (NeurIPS 2021), Online, 6–14 December 2021; Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W., Eds.; Neural Information Processing Systems Foundation Inc.: San Diego, CA, USA, 2021; pp. 22419–22430. [Google Scholar]
Zhang, D.; Qiao, Y.; She, L.; Shen, R.; Ren, J.; Zhang, Y. Two Time-Scale Resource Management for Green Internet of Things Networks. IEEE Internet Things J. 2019, 6, 545–556. [Google Scholar] [CrossRef]
Balmer, M.; Weibel, R.; Huang, H. Value of incorporating geospatial information into the prediction of on-street parking occupancy: A case study. Geo-Spat. Inf. Sci. 2021, 24, 438–457. [Google Scholar] [CrossRef]
Feng, Y.; Tang, Z.; Xu, Y.; Krishnamoorthy, S.; Hu, Q. Predicting vacant parking space availability zone-wisely: A densely connected ConvLSTM method. In Proceedings of the 2021 IEEE Vehicle Power and Propulsion Conference (VPPC), Gijon, Spain, 25–28 October 2021; pp. 1–6. [Google Scholar] [CrossRef]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting. In Proceedings of the International Conference on Machine Learning, ICML 2022, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In Proceedings of the Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Park, C.; Lee, C.; Bahng, H.; Tae, Y.; Kim, K.; Jin, S.; Ko, S.; Choo, J. ST-GRAT: A Novel Spatio-Temporal Graph Attention Networks for Accurately Forecasting Dynamically Changing Road Speed. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management 2020, Galway, Ireland, 19–23 October 2020; pp. 1215–1224. [Google Scholar] [CrossRef]
Liu, C.; Hettige, K.H.; Xu, Q.; Long, C.; Xiang, S.; Cong, G. ST-LLM+: Graph Enhanced Spatio-Temporal Large Language Models for Traffic Prediction. IEEE Trans. Knowl. Data Eng. 2025, 37, 4846–4859. [Google Scholar] [CrossRef]
Ju, W.; Zhao, Y.; Qin, Y.; Yi, S.; Yuan, J.; Xiao, Z.; Luo, X.; Yan, X.; Zhang, M. COOL: A Conjoint Perspective on Spatio-Temporal Graph Neural Network for Traffic Forecasting. Inf. Fusion 2024, 107, 102341. [Google Scholar] [CrossRef]
Jin, Y.; Chen, K.; Yang, Q. Transferable Graph Structure Learning for Graph-based Traffic Forecasting Across Cities. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023, Long Beach, CA, USA, 5 August 2023; pp. 1032–1043. [Google Scholar] [CrossRef]
Feng, Y.; Qin, Y.; Zhao, S. Correlation-split and Recombination-sort Interaction Networks for air quality forecasting. Appl. Soft Comput. 2023, 145, 110544. [Google Scholar] [CrossRef]
Li, J.; Qu, H.; You, L. An Integrated Approach for the Near Real-Time Parking Occupancy Prediction. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3769–3778. [Google Scholar] [CrossRef]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? Proc. AAAI Conf. Artif. Intell. 2023, 37, 11121–11128. [Google Scholar] [CrossRef]
Liu, Y.; Li, C.; Wang, J.; Long, M. Koopa: Learning Non-stationary Time Series Dynamics with Koopman Predictors. In Advances in Neural Information Processing Systems 36, Proceedings of the Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, 10–16 December 2023; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Neural Information Processing Systems Foundation Inc.: San Diego, CA, USA, 2023. [Google Scholar]
Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. In Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In Proceedings of the Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Ren, Y.; Liu, A.; Mao, X.; Li, F. An intelligent charging scheme maximizing the utility for rechargeable network in smart city. Pervasive Mob. Comput. 2021, 77, 101457. [Google Scholar] [CrossRef]
Chen, X.; Zhang, W.; Xu, X.; Cao, W. A public and large-scale expert information fusion method and its application: Mining public opinion via sentiment analysis and measuring public dynamic reliability. Inf. Fusion 2022, 78, 71–85. [Google Scholar] [CrossRef]

Figure 1. Comparison diagram of error accumulations in single-step prediction and global consistency in medium-to-long-term prediction.

Figure 2. Visualization of input and prediction intervals for the Scale-Fusion Transformer parking prediction model.

Figure 3. SFFormer model architecture.

Figure 4. Spatio-temporal heatmap of multi-dimensional parking occupancy in C1.

Figure 5. Line chart of comparison of parking demand changes on different types of days in C1.

Figure 6. Parking occupancy time series stationarity analysis via ADF test (datasets C1–C8).

Figure 7. Performance comparison line chart of four models on the C1 dataset.

Figure 8. MSE performance of patch parameter combinations across prediction horizons.

Figure 9. Cross-dataset error distribution verification experiment based on the SFFormer model.

Table 1. Time series stationarity test results of datasets C1–C8.

Dataset	ADF Statistic	ADF p-Value	KPSS Statistic	KPSS p-Value	Stationarity
C1	−8.4060	2.17 × 10⁻¹³	0.2361	0.100000	Stationary
C2	−8.5443	9.60 × 10⁻¹⁴	0.1078	0.100000	Stationary
C3	−8.8923	1.23 × 10⁻¹⁴	1.3949	0.010000	Non-Stationary
C4	−12.5428	2.29 × 10⁻²³	0.0376	0.100000	Stationary
C5	−10.3025	3.34 × 10⁻¹⁸	0.2107	0.100000	Stationary
C6	−9.0999	3.63 × 10⁻¹⁵	0.1317	0.100000	Stationary
C7	−9.5234	3.02 × 10⁻¹⁶	0.1146	0.100000	Stationary
C8	−8.4773	1.42 × 10⁻¹³	0.4448	0.057859	Stationary

Table 2. Comparison of experimental results for available parking space prediction on dataset C1. The bolded numbers indicate the optimal values of MSE and MAE for a certain model on the C1 dataset, and the underlined ones are the next best.

Models		SFFormer		PatchTST		iTransformer		DLinear		Autoformer
Metric		MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE
Park lots C1	24	0.0261	0.1231	0.0293	0.1241	0.0471	0.1491	0.0711	0.1822	0.7937	0.6738
	48	0.0392	0.1475	0.0460	0.1589	0.0671	0.1667	0.0872	0.2025	0.5512	0.6025
	96	0.0629	0.1942	0.0728	0.2091	0.0669	0.1787	0.0956	0.2145	0.5432	0.6096
	144	0.0533	0.1748	0.0766	0.2066	0.0703	0.1894	0.0971	0.2159	0.8022	0.7070
	216	0.0631	0.1891	0.1004	0.2374	0.0825	0.2101	0.0998	0.2184	0.3983	0.4998
	288	0.0756	0.2021	0.1021	0.2349	0.1048	0.2391	0.1033	0.2221	0.5683	0.5997
	576	0.0950	0.2260	0.0945	0.2301	0.2010	0.3131	0.1122	0.234	0.8805	0.7512
	720	0.1086	0.2399	0.1108	0.2471	0.2310	0.3337	0.1143	0.2372	0.4174	0.5248
Optimal Counts		7	6	1	0	0	1	0	1	0	0

Table 3. Comparison table of the impact of

p_{s}

hyperparameter on MSE in multi-time step prediction for C1 parking lot (

p_{l} = 288

). The bolded numbers indicate the optimal values of MSE and MAE for a certain model on the C1 dataset, and the underlined ones are the next best.

Table 3. Comparison table of the impact of

p_{s}

hyperparameter on MSE in multi-time step prediction for C1 parking lot (

p_{l} = 288

). The bolded numbers indicate the optimal values of MSE and MAE for a certain model on the C1 dataset, and the underlined ones are the next best.

Arg		$p_{s} = 4$	$p_{s} =$ 16	$p_{s} = 32$	$p_{s} = 48$	$p_{s} = 72$	$p_{s} = 96$	$p_{s} =$ 144
Arg		$p_{l} = 288$	$p_{l} = 288$	$p_{l} = 288$	$p_{l} = 288$	$p_{l} = 288$	$p_{l} = 288$	$p_{l} = 288$
Park lots C1	24	0.0310	0.0261	0.0300	0.0257	0.0271	0.0269	0.0290
	48	0.0485	0.0392	0.0465	0.0483	0.0449	0.0777	0.0511
	96	0.0604	0.0629	0.0573	0.1022	0.0641	0.0629	0.0627
	144	0.0626	0.0533	0.0546	0.0528	0.0596	0.0687	0.0743
	216	0.0741	0.0631	0.0577	0.0586	0.0618	0.0611	0.0718
	288	0.0804	0.0756	0.0718	0.0643	0.0657	0.0735	0.0756
	576	0.0962	0.0950	0.0781	0.0806	0.0840	0.0867	0.1014
	720	0.1091	0.1086	0.0994	0.0973	0.0989	0.0894	0.1137
Optimal Counts		0	0	0	3	3	1	0

Table 4. Comparison table of the impact of

p_{s}

hyperparameter on MSE in multi-time step prediction for C1 parking lots (

p_{l} = 576

). The bolded numbers indicate the optimal values of MSE and MAE for a certain model on the C1 dataset, and the underlined ones are the next best.

Table 4. Comparison table of the impact of

p_{s}

hyperparameter on MSE in multi-time step prediction for C1 parking lots (

p_{l} = 576

). The bolded numbers indicate the optimal values of MSE and MAE for a certain model on the C1 dataset, and the underlined ones are the next best.

Arg		$p_{s} = 4$	$p_{s} =$ 16	$p_{s} = 32$	$p_{s} = 48$	$p_{s} = 72$	$p_{s} = 96$	$p_{s} =$ 144
Arg		$p_{l}$ = 576	$p_{l} =$ 576	$p_{l} =$ 576	$p_{l} =$ 576	$p_{l} =$ 576	$p_{l} =$ 576	$p_{l} =$ 576
Park lots C1	24	0.0333	0.0274	0.0282	0.0271	0.0282	0.0271	0.0264
	48	0.0544	0.0446	0.0404	0.0431	0.0454	0.0458	0.0489
	96	0.0566	0.0544	0.0502	0.0587	0.0624	0.0567	0.0526
	144	0.0697	0.0674	0.0560	0.0610	0.0567	0.0827	0.0670
	216	0.0939	0.0833	0.0678	0.0828	0.0756	0.0866	0.0678
	288	0.0953	0.0982	0.0804	0.089	0.0876	0.0851	0.0696
	576	0.0906	0.0921	0.0762	0.0826	0.0735	0.0781	0.0628
	720	0.1058	0.1182	0.0969	0.1030	0.1996	0.1448	0.0852
Optimal Counts		0	0	4	1	0	1	5

Table 5. Ablation experiment results of dataset C1.

Prediction Step Size	24	48	96	144	216	288	576	720
Without $p_{s}$	0.0303	0.0559	0.0692	0.0683	0.0762	0.0896	0.0963	0.1326
Without $p_{l}$	0.0293	0.0460	0.0728	0.0766	0.1004	0.1021	0.0945	0.1108
Without $A A P$	0.0488	0.0619	0.0859	0.0925	0.0953	0.1101	0.1038	0.1425
SFFormer	0.0261	0.0392	0.0629	0.0533	0.0631	0.0756	0.095	0.1086

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Wu, M.; Li, S.; Cai, Y.; Long, W.; Yang, B. Scale-Fusion Transformer: A Medium-to-Long-Term Forecasting Model for Parking Space Availability. Electronics 2025, 14, 3636. https://doi.org/10.3390/electronics14183636

AMA Style

Chen J, Wu M, Li S, Cai Y, Long W, Yang B. Scale-Fusion Transformer: A Medium-to-Long-Term Forecasting Model for Parking Space Availability. Electronics. 2025; 14(18):3636. https://doi.org/10.3390/electronics14183636

Chicago/Turabian Style

Chen, Jie, Mengli Wu, Sheng Li, Yunyi Cai, Wangchen Long, and Bo Yang. 2025. "Scale-Fusion Transformer: A Medium-to-Long-Term Forecasting Model for Parking Space Availability" Electronics 14, no. 18: 3636. https://doi.org/10.3390/electronics14183636

APA Style

Chen, J., Wu, M., Li, S., Cai, Y., Long, W., & Yang, B. (2025). Scale-Fusion Transformer: A Medium-to-Long-Term Forecasting Model for Parking Space Availability. Electronics, 14(18), 3636. https://doi.org/10.3390/electronics14183636

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Scale-Fusion Transformer: A Medium-to-Long-Term Forecasting Model for Parking Space Availability

Abstract

1. Introduction

2. Related Work

3. Model Description

3.1. Problem Definition

3.2. SFFormer Model Architecture

4. Results

4.1. Experimental Data and Data Analysis

4.1.1. Stationarity Analysis

4.2. Performance Comparison

4.2.1. Experimental Results and Analysis

4.2.2. Baseline Model and Experimental Setup

4.2.3. Ablation Experiment

4.2.4. Error Distribution Characteristics Experiment

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI